Connected Vehicle Pilot Deployment Program Driving Towards Deployment: Lessons Learned from the Design/Build/Test Phase
In September of 2015, USDOT selected New York City Department of Transportation (NYCDOT), Wyoming Department of Transportation (WYDOT) and Tampa Hillsborough Expressway Authority (THEA) as the recipients of a combined $42 million in federal funding to implement a suite of connected vehicle applications and technologies tailored to meet their region’s unique transportation needs under the Connected Vehicle Pilot Deployment Program.
Following the award, each site spent 12 months preparing a comprehensive deployment concept to ensure rapid and efficient connected vehicle capability roll-out. The sites next completed a 24-month phase to design, build, and test these deployments of integrated wireless in-vehicle, mobile device, and roadside technologies. As of early 2019, the sites are entering a third phase of the deployment where the tested connected vehicle systems will become operational for a minimum 18-month period and will be monitored on a set of key performance measures.
Given the promising future of connected vehicle deployments and the growing early deployer community, experiences and insights across all stages of the Design/Build/Test Phase of the CV Pilots have been collected to serve as lessons learned and recommendations for future early deployer projects and efforts.
The following lessons were identified regarding the sensitivities with the type and amount of data that needs to be collected and the need for a data governance framework that outlines how data will be collected, managed and archived.
Assess data collection needs and requirements
- While the CV system can provide terabytes of data, it is important to have a good understanding of what data is needed for what purposes and where. Data collection must be scalable and sustainable and should provide value during system operation. For example, recording and uploading every Basic Safety Message (BSM) the RSU hears when vehicles are in range in an urban environment to a TMC will typically result in over 500 BSMs from each instrumented vehicle within range of an RSU traveling at 25-30 MPH. If the goal is to compute travel times between RSUs – then a single vehicle will result in the TMC receiving on the order of 1,000 BSMs under free-flow conditions (and this can easily double or triple if the vehicle is stopped). In reality, to compute travel times, the TMC only needs a single BSM from a configured zone within each intersection to begin the matching operation and measurement of travel times. The result is a reduction in 99.8 percent of the network data flow, and a reduction in processing at the TMC by a similar amount.
Further, consider the scalability problem with the processing for travel time data. If every vehicle were equipped, then the TMC’s task is unmanageable at a reasonable cost. The NYC pilot project had to address these issues due to the expected density of CV equipped vehicles, the limitation of the backhaul bandwidth, and a limit to the processing power at the TMC. Travel times are a critical element to the City’s adaptive control system, and by using the RSU to determine when the vehicle is within a small zone at the intersection makes it possible to compute the travel times. Likewise, as one looks to more sophisticated local monitoring, the combination of the RSU and the Advanced Transportation Controllers (ATC) can convert the data streams to usable information such as queue lengths such that it can share data with the TMC to improve the allocation of phase time, progression, and platoon management.
Have a plan for how the data will be handled both during and post-deployment
- Connected vehicle, mobile device, and infrastructure sensor data captured during the operational phase of the Pilot’s was required to be shared with the independent evaluator in support of the broader evaluation. In addition, data stripped of personally identifiable information (PII) was required to be posted on the ITS Public Data Hub. However, uncertainties regarding data ownership led to sites concerns over subpoenas. After some back-and-forth around the issue, specific language was developed that clarified protections for the data. All CV data sent to the IE was sworn to protection from PII disclosures and the potential to expose privacy-related tracking information.
Regarding the fate of the data post-Pilots, the USDOT plans to follow the standard data access and retention contract language for JPO-funded projects, which states that JPO-funded data should be retained in a research data access system for two years past the date of original data collection. If there proves to be sufficient value in retaining the data past that point, it will be done on a case-by-case basis. This could include transferring the data to a more persistent operational archive.
Implement data collection procedures and techniques that reduce the burden on the communications network and account for the limitations of backhaul bandwidth
- All municipal systems within New York City utilize the New York City Wireless Network (NYCWiN), limiting the bandwidth that the NY CV Pilot had access to. While the Tampa and Wyoming pilots are collecting vehicle data continuously, the NYC Pilot is only doing event-based data collection to address these limitations. Whenever a configurable event occurs (e.g. hard breaks, steering turns or hard accelerations), all BSMs before and after an event for a configurable amount of time are combined and encrypted into what becomes an "event" record.
CV infrastructure naturally provides the opportunity for edge processing and the aggregation of CV information to foster better mobility. NYC looked to incorporating edge computing concepts into their data management plans to further address their needs for a more scalable data collection. As opposed to having all data processing occur at the TMC, New York City designed their system architecture to have some data processing occur at "edge" devices (RSUs, OBUs). By performing local processing at the edge instead of streaming all the data to a central cloud for processing, NYC was able to reduce the amount of bandwidth used.
Plan accordingly for data storage requirements
- Preliminary vehicle, mobile device and infrastructure data estimates should be calculated early on to determine the data storage systems needed (including CPU and disk needs). Note that the estimate for interactions between CVs is highly dependent on how often connected vehicles will be traveling within range of each other and interacting. Note that fleet vehicles may have higher daily operational hours than private passenger vehicles and produce proportionally more data.
During the data collection period, the magnitude of raw and processed data volume should be closely monitored over time to anticipate and respond to any needed data storage needs, such as increasing storage at the TMC or changing the frequency at which devices upload data.
Adopt a metadata standard that all data providers agree to and comply with
- Metadata standards defining what needs to be included in the metadata associated with a data set should be adopted for all data that is uploaded for evaluation/public consumption.
Uploads of preliminary sample data to USDOT’s Secure Data Commons (SDC) Portal, a cloud-based analytic sandbox, was unorganized and lacked critical data dictionaries that the independent evaluator (IE) needed. To prevent further undocumented data in the SDC, the IE eventually incorporated a "form" of contextual data that the sites were required to fill out for every new table or data type uploaded to the platform.