Predicting anomalous crowds, i.e., the irregular people gatherings in unusual events is a very important topic to enable appropriate countermeasures such as crowd control or avoidance of the COVID-19 epidemic.
The existing work [Konishi+, UbiComp’16] proposed the method of irregularity prediction of urban dynamics based on transit searching query logs(Project page.); however, predicting anomalous crowds by transit search logs is quite unstable due to the complex usage of transit search, that is, some people do not attend the event although they search for routes, while others people search for routes at least twice or more.
Motivated by this, we take into account GPS-based mobility logs, which have been widely used in urban dynamics analysis, in addition to the transit search records.
From previous studies focusing on urban dynamics modeling [Shimosaka+, UbiComp2015], it is known that daily visitor dynamics depend on external contexts such as weather and normal holidays. Therefore, we defined abnormal crowd dynamics as the degree of deviation from the number of visitors observed on a daily basis, and proposed a method to predict the degree of crowd anomaly based on the transit search history.
We conducted large-scale experiments for performance evaluation at 58 points of interest (POIs), including event venues, with real-world mobility logs and transit search histories, and verified the effectiveness of the proposed method in predicting anomalous crowds. [Anno+, SigUbi66; Anno+ SIGSPATIAL’20]
Furthermore, since crowd congestion is mostly caused by non-daily events, crowd gatherings are rarely observed compared to daily crowd dynamics, and there is an imbalance between the number of daily patterns and congestion patterns in the data set.
This data imbalance can increase the bias of the forecast model and cause the model to overfit the daily dynamics. In order to take more effective safety measures against crowd congestion, “precise forecasts” that quantitatively estimate the number of visitors during congestion are needed, and forecasting models that focus on daily dynamics could not be applied.
In this study, we focused on the technique of importance weighting as a means to suppress the increase of model bias. In order to construct the importance weighting, it is necessary to divide the scheduled behavioral features of users, which are the input to the model, into daily patterns and congestion patterns. On the other hand, this feature varies depending on the context, such as the weather or a normal day off, thus dividing it is a non-trivial task.
Therefore, we proposed an annotation scheme for congestion labeling using heterogeneous features based on GPS logs as a means of quantitatively performing this segmentation. In this scheme, from the crowd dynamics recorded in the GPS logs, we determine the daily patterns that depend on the weather and normal holidays, and other congestion patterns, and pseudo-assign these labels to the scheduled behavioral features. This method allows us to divide the data quantitatively in a data-driven manner and to construct the importance effectively. Through empirical experiments using actual transit search logs and GPS logs, we have confirmed that the proposed method can predict crowd dynamics during congestion with less error than existing methods at 58 POIs nationwide while performing equally well for daily patterns [Anno+ SIGSPATIAL’21].
Furthermore, the small number of crowded patterns also leads to an increase in model variance, which causes instability in prediction. To address this issue, we proposed Importance-based Synthetic Oversampling by extending the importance weighting proposed in [Anno+ SIGSPATIAL’21] with Synthetic Minority Oversampling [Chawla+, Journal Of Artificial Intelligence Research 2002]. This method aims to reduce model variance by synthetically sampling more crowded patterns and increasing the number of learning patterns within a framework that theoretically guarantees the suppression of model bias. We conducted the performance evaluation experiments similar to those in [Anno+ SIGSPATIAL’20, Anno+ SIGSPATIAL’21] and confirmed the predictive performance improvement of the proposed method [Anno+, IEEE Pervasive Computing 2023].
Publications
GPS位置履歴と鉄道の乗換検索履歴を用いた異常混雑事前予測
情報処理学会研究報告 第66回UBI合同研究発表会, オンライン開催, 5 2020.
Soto Anno and Kota Tsubouchi and Masamichi Shimosaka.
Supervised-CityProphet: Towards Accurate Anomalous Crowd Prediction
SIGSPATIAL’20: Proceedings of the 28th International Conference on Advances in Geographic Information Systems, November 2020, Seattle, Washington, USA, November 2020.
Soto Anno and Kota Tsubouchi and Masamichi Shimosaka.
CityOutlook: Early Crowd Dynamics Forecast towards Irregular Events Detection with Synthetically Unbiased Regression.
SIGSPATIAL’21: Proceedings of the 29th International Conference on Advances in Geographic Information Systems, November 2021, Beijing, China, November 2021.
Soto Anno and Kota Tsubouchi and Masamichi Shimosaka.
CityOutlook+: Early Crowd Dynamics Forecast through Unbiased Regression with Importance-based Synthetic Oversampling.
IEEE Pervasive Computing (to appear).