Developing Data Fusion Techniques: PSU Makes Progress Toward Applying Machine Learning Methods to Estimate Bike Counts

1614 PSU Drupal News Header.png

Over the past several years, in a series of research projects, researchers at Portland State University (PSU) have been developing a new approach to estimate active transportation volumes using machine learning.

This emerging method, which can predict how many people will be biking or walking on any given road, trail or segment of a transportation network at any time, offers promising applications for transportation agencies and state departments of transportation (DOTs). These organizations can use accurate bicycle and pedestrian volume information to track changes over time, prioritize projects, plan and design new infrastructure, conduct safety analyses and estimate public health impacts.

"These methods are still evolving, and it's still in the research phase. But I think the time is not far off when we will start using these methods as more mainstream," said Sirisha Kothuri of the Maseeh College of Engineering and Computer Science, the lead researcher on this series of projects.

The method Kothuri and other researchers are developing is referred to as "data fusion" because it involves combining multiple data sources, including traditional permanent and short-term counting methods as well as newer crowdsourced data streams from entities like Strava and Streetlight.

HOW DOES IT WORK?

Traditional permanent and short-term counting methods can directly provide counts, but are limited to certain locations or short periods of time. Meanwhile, crowdsourced data (such as Strava or StreetLight) can cover a wider area but with less accuracy, as they only capture a subset of users.

Fusing the two methods together–potentially with the use of deep learning algorithms–is a promising way to get the best of both.

The researchers train a computer model on existing count data from certain locations, then use that trained model to predict volumes at locations where there is count data that the model hasn't seen. They then compare the model's predictions with the actual count data to see how accurate it is.

Using long short-term memory networks and deep neural networks, the method involves the combining of static variables—such as network characteristics, demographics, and land use— with dynamic crowdsourced data and count data from different regions. The research has shown that crowd-sourced data alone cannot replace traditional count data. For the method to work, both are necessary.

Regional data is also key to the success of the model: the more local count data the model can be trained on, the better its accuracy will be for the area in which it will be used.

The models tend to fare better when using Monthly Average Daily Bicyclists (MADB) as a target, rather than Annual Average Daily Bicyclists (AADB), because breaking each counter down into monthly units gives them more data points to work with.

"Basically, the more data a model has, the smarter it gets," said Banafsheh Rekabdar, an Assistant Professor of Computer Science in the Maseeh College of Engineering and Computer Science who worked with Kothuri on the latest project.

The graphic below offers an overview of the path of data from original sources as it moves through the process developed by the researchers:

A SERIES OF RESEARCH EFFORTS FUNDED BY MULTIPLE ORGANIZATIONS 

These research efforts got underway in 2018 with funding from the National Institute for Transportation and Communities (NITC). NITC launched a pooled fund project with support from the DOTs of Oregon, Virginia, Colorado, Utah, and the District of Columbia, as well as Central Lane MPO and the Cities of Portland and Bend, Oregon. With matching funds from NITC, those agencies came together to fund the initial project Exploring Data Fusion Techniques to Estimate Network-Wide Bicycle Volumes, with a research team led by Kothuri made up of researchers from PSU and the University of Texas at Arlington. The objective of this study was to fuse traditional count data with crowdsourced data, land use and sociodemographic data to estimate bicycle volumes on a network. It was the first large scale of its kind to include data from multiple regions and years to generate bicycle volumes using data fusion techniques.

Next came "Estimating Bicyclist Volumes with Crowdsourced Data," a study funded by the Washington Department of Transportation (WSDOT), which built on the initial efforts and focused on the transferability of bicycle volume models that were estimated as part of the NITC pooled study.. As part of a case study for this project, the researchers showed how bicycle volumes can be estimated for certain high-risk crash corridors rather than the entire network using data fusion techniques, which can be a critical input for safety analyses.

Kothuri and her team then focused on another NITC study which focused on adapting the bicycle volume estimation techniques to the pedestrian context.This study used data fusion techniques to combine crowdsourced data (Strava pedestrian data) along with static contextual data to model 2-hour PM peak pedestrian volumes.

On the bike side, the WSDOT study was followed by a NITC technology transfer initiative aimed at improving the accuracy of the bicycle volume estimates using machine learning techniques.

The latest report to come out of these efforts, Improving the Accuracy and Precision of Bicycle Volume Estimates Using Advanced Machine Learning Approaches (PDF) by Sirisha Kothuri, Banafsheh Rekabdar and Joe Broach of Portland State University, pushed the needle forward on using advanced techniques to extrapolate data over a large transportation network. Two PSU graduate students also worked on the project: Saba Izadkhah, who is working toward a PhD in computer science, and Andrew Wagner, a computer science masters student.

A paper based on this work was presented at the Institute of Electrical and Electronics Engineers' International Conference on Artificial Intelligence x Science, Engineering and Technology at the beginning of October. Kothuri also presented updates on the data fusion method at the 2024 Pacific Northwest Transportation Consortium (PacTrans) Conference.

"We know that for pedestrians, injuries and fatalities are at an all time high. Bicyclist safety is also of top concern. So these estimates are really critical for agencies right now," Kothuri said.

Portland State University's Transportation Research and Education Center (TREC) is a multidisciplinary hub for all things transportation. We are home to the Initiative for Bicycle and Pedestrian Innovation (IBPI), the data programs PORTAL and BikePed Portal, the Better Block PSU program, and PSU's membership in PacTrans, the Pacific Northwest Transportation Consortium. Our continuing goal is to produce impactful research and tools for transportation decision makers, expand the diversity and capacity of the workforce, and engage students and professionals through education, seminars, and participation in research. To get updates about what's happening at TREC, sign up for our monthly newsletter or follow us at the links below.

 BlueSky  |  Instagram  |  LinkedIn  |  Facebook  |  TikTok  |  YouTube

Share this: