From Big Noise to Big Data: Toward the Verification of Large Data Sets for Understanding Regional Retail Flows


There has been much excitement among quantitative geographers about newly available data sets, characterized by high volume, velocity, and variety. This phenomenon is often labeled as Big Data'' and has contributed to methodological and empirical advances, particularly in the areas of visualization and analysis of social networks. However, a fourth v— veracity (or lack thereof)— has been conspicuously lacking from the literature. This article sets out to test the potential for verifying large data sets. It does this by cross-comparing three unrelated estimates of retail flows— human movements from home locations to shopping centers— derived from the following geo-coded sources: (1) a major mobile telephone service provider; (2) a commercial consumer survey; and (3) geotagged Twitter messages. Three spatial interaction models also provided estimates of flow: constrained and unconstrained versions of the gravity model'' and the recently developed ``radiation model.'' We found positive relationships between all data-based and theoretical sources of estimated retail flows. Based on the analysis, the mobile telephone data fitted the modeled flows and consumer survey data closely, while flows obtained directly from the Twitter data diverged from other sources. The research highlights the importance of verification in flow data derived from new sources and demonstrates methods for achieving this.

Geographical Analysis