Spatial extension of the reality mining dataset
Data captured from a live cellular network with the real users during their common daily routine help to understand how the users move within the network. Unlike the simulations with limited potential or expensive experimental studies, the research in user-mobility or spatio-temporal user behavior can be conducted on publicly available datasets such as the Reality Mining Dataset. These data have been for many years a source of valuable information about social interconnection between users and user-network associations. However, an important, spatial dimension is missing in this dataset. In this paper, we present a methodology for retrieving geographical locations matching the GSM cell identifiers in the Reality Mining Dataset, an approach base on querying the Google Location API. A statistical analysis of the measure of success of locations retrieval is provided. Further, we present the LAC-clustering method for detecting and removing outliers, a heuristic extension of general agglomerative hierarchical clustering. This methodology enables further, previously impossible analysis of the Reality Mining Dataset, such as studying user mobility patterns, describing spatial trajectories and mining the spatio-temporal data.