When is Big Data Big Enough? Implications of Using GPS-Based Surveys for Travel Demand Analysis

Akshay Vij and Kalyanaraman Shankari

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2014-141
August 1, 2014

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-141.pdf

A number of studies in the last decade have argued that GPS-based surveys offer the potential to replace traditional travel diary surveys. GPS-based surveys impose lower respondent burden, offer greater spatiotemporal precision and incur fewer monetary costs. However, GPS-based surveys do not collect certain key inputs required for the estimation of travel demand models, such as the travel mode(s) taken or the trip purpose, relying instead on data-processing procedures to infer this information. This study assesses the impact that errors in inference can have on travel demand models estimated using data from GPS-based surveys. We use simulated datasets to compare performance across different sample sizes, inference accuracies and estimation methods. Findings from the simulated datasets are corroborated with real data collected from individuals living in the San Francisco Bay Area, United States. Results indicate that the benefits of using GPS-based surveys will vary significantly, depending upon the sample size of the data, the accuracy of the inference algorithm and the desired complexity of the travel demand model specification. In many cases, gains in the volume of data that can potentially be retrieved using GPS devices may be offset by the loss in quality caused by inaccuracies in inference. For example, a Monte Carlo experiment finds that a relatively parsimonious model of travel mode choice behavior that could reliably be estimated using 100 high-quality observations could need 10,000 observations and more, depending upon the accuracy of the inference algorithm. This study argues that GPS-based surveys may never entirely replace traditional travel diary surveys. For data from GPS-based surveys to be useful for existing modes of travel demand analysis, it needs either to be supplemented with data collected from traditional surveys or GPS-based surveys need to allow for direct interaction with the study participant. Alternatively, newer modes of analysis need to be developed that can compensate for inaccuracies in data from existing GPS-based surveys.


BibTeX citation:

@techreport{Vij:EECS-2014-141,
    Author = {Vij, Akshay and Shankari, Kalyanaraman},
    Title = {When is Big Data Big Enough? Implications of Using GPS-Based Surveys for Travel Demand Analysis},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2014},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-141.html},
    Number = {UCB/EECS-2014-141},
    Abstract = {A number of studies in the last decade have argued that GPS-based surveys offer the potential to replace traditional travel diary surveys. GPS-based surveys impose lower respondent burden, offer greater spatiotemporal precision and incur fewer monetary costs. However, GPS-based surveys do not collect certain key inputs required for the estimation of travel demand models, such as the travel mode(s) taken or the trip purpose, relying instead on data-processing procedures to infer this information. This study assesses the impact that errors in inference can have on travel demand models estimated using data from GPS-based surveys. We use simulated datasets to compare performance across different sample sizes, inference accuracies and estimation methods. Findings from the simulated datasets are corroborated with real data collected from individuals living in the San Francisco Bay Area, United States. Results indicate that the benefits of using GPS-based surveys will vary significantly, depending upon the sample size of the data, the accuracy of the inference algorithm and the desired complexity of the travel demand model specification. In many cases, gains in the volume of data that can potentially be retrieved using GPS devices may be offset by the loss in quality caused by inaccuracies in inference. For example, a Monte Carlo experiment finds that a relatively parsimonious model of travel mode choice behavior that could reliably be estimated using 100 high-quality observations could need 10,000 observations and more, depending upon the accuracy of the inference algorithm. This study argues that GPS-based surveys may never entirely replace traditional travel diary surveys. For data from GPS-based surveys to be useful for existing modes of travel demand analysis, it needs either to be supplemented with data collected from traditional surveys or GPS-based surveys need to allow for direct interaction with the study participant. Alternatively, newer modes of analysis need to be developed that can compensate for inaccuracies in data from existing GPS-based surveys.}
}

EndNote citation:

%0 Report
%A Vij, Akshay
%A Shankari, Kalyanaraman
%T When is Big Data Big Enough? Implications of Using GPS-Based Surveys for Travel Demand Analysis
%I EECS Department, University of California, Berkeley
%D 2014
%8 August 1
%@ UCB/EECS-2014-141
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-141.html
%F Vij:EECS-2014-141