A Market Scan Study on Emerging Data Science for Transit Cited the Importance of Data Hygiene in Enabling Advanced Data Analytics.
Transit providers in the U.S. collected a large volume of data from Intelligent Transportation Systems (ITS) technologies, including Automatic Vehicle Locators, Automatic Passenger Counters and Automatic Fare Collection equipment. These data can be leveraged by transportation agencies to improve planning, operations, and safety. This study described the state of the practice in the use of emerging data science tools and methodologies among U.S. transit agencies, identified and investigated common challenges and opportunities, as well as the ways in which data science and ITS technology capabilities could be used to rectify those problems. Specifically, ITS applications to support asset health monitoring/predictive maintenance, monitor occupancy, improve operational efficiency, and manage performance/planning were researched. The study engaged both in interviews with transit agencies and a literature review of the existing ITS technologies. Two separate groups of transit agencies were interviewed. The first group comprised seven transit agencies and one partnering transit lab and focused on agencies conducting innovative data science. The second group included four agencies that expressed an interest in emerging data science practices through prior innovative work.
The lessons learned from this study are presented below.
- Maintain a high level of “data hygiene.” To perform advanced data analysis and deliver high-quality results, conduct regular data cleaning to establish a usable data format. This may require a significant time investment, but it is crucial for applying emerging data sciences in public transportation.
- Explore opportunities to unlock latent value in the available datasets. Transit systems and agencies already produce extensive data. Integrating and exploring available datasets may offer as much as or more value than new data collection efforts or novel analysis approaches.
- Establish an open-source data platform. Several agencies interviewed in this study mentioned that an open data platform could help encourage information exchange and develop emerging practices. Publicly available data also make it easier for agencies to partner with research institutions and transit labs.
- Recognize the gap between theory and practice in advanced data science. Interviews and the literature review conducted in this study revealed that although there were many academic examples of how machine learning may benefit transit planning and operations, examples of practical, on-the-ground applications were more limited.
- Recognize the importance of transit-specific domain knowledge and technical expertise. Many interviewed agencies emphasized the importance of staff with specific expertise and experience. It is critical for the data scientist to understand transit data, which can be complex and non-intuitive to new users. Agencies suggested that familiarity with the transit sector seemed as important as experience and expertise in data science, an observation that applied to in-house staff as well as vendors and consultants.
- Use vendors strategically, with realistic expectations. Vendors and consultants can offer invaluable service to agencies with fewer in-house resources in applying emerging practices. For transit agencies and vendors/consultants, approaching new data-focused projects with realistic expectations was deemed critical for success.
- Leverage informal peer networks. The interviewed agencies stated that they relied heavily on informal networks of their counterparts in peer agencies to learn about new approaches and troubleshoot issues.