Leveraging Big Data in the Public Transportation Industry
Big Data is a general term for very large, unstructured sets of data that typically need to be cataloged and structured before analyzing them for insights. These insights may assist to solve problems, improve decisions, and increase efficiency. Transit agencies generally collect several types of data in routine operations, such as vehicle locations, fare transactions, and passenger counts, which can be considered as Big Data. Transit agencies have been applying a number of advanced data analysis techniques, such as machine learning, to analyze their Big Data and unlock insights from these data. In 2018, the American Public Transportation Association (APTA) conducted a survey and series of discussions with transit agencies across the United States to help them better understand how to utilize Big Data. Results from industry interviews and survey responses from 71 transit agencies were then distilled into a series of Big Data best practices for transit agencies.
- Promote a Culture of Data Analysis Within Agencies: Transit agency staff are often the most important component of Big Data analysis, and without adequately trained staff, transit agencies face obstacles in analyzing and using their data. Transit agencies should work to promote a culture of evidence-based decision making and data analysis with support among leadership.
- Consider How Data Will Be Standardized at an Early Stage: Lack of data standardization is a major obstacle to Big Data analysis, with 69 percent of survey respondents indicating it was an issue. To reduce this issue, agencies should consider upfront how they will verify, standardize, and store their data. For example, one agency invested in a single data warehouse which helped to verify and standardize data collected from multiple sources and provide for consistent access.
- Find Ways to Work with Third Parties: Third-party transportation providers, such as Transportation Network Companies (TNCs) or micromobility services, often have valuable data, such as travel demand patterns, which can be of great use to transit agencies. However, companies may not want to share this data with transit agencies for a variety of reasons, including privacy concerns. Agencies should find ways to address concerns, such as using anonymization, to enable working with these third-party data sets.