Is it possible to combine new data sources, like road sensors in motorways, GPS data and camera footage with information from questionnaires? And subsequently, is it possible to estimate how goods are transported over the Dutch road network? Precisely that was the question Yinyi Ma was asking herself in her dissertation, which was funded and supervised by Statistics Netherlands (CBS). On 3 June, Yinyi Ma successfully defended her thesis at Erasmus University Rotterdam.
Traditionally, truck drivers and transport companies complete questionnaires for traffic and transport statistics compiled by CBS. But nowadays, the amount of data sources available has increased substantially and they are much more extensive. For example, data provided by road sensors in motorways, GPS data and camera footage. PhD student Yinyi Ma: ‘I investigated how these new data sources can be integrated into existing sources and how road haulage can be rated on the basis of these data.’
Yinyi Ma investigated a theoretical model (a hierarchical Bayesian network) enabling her to combine data from various sources. The model was successfully tested. ‘The model I suggest in my investigation can enhance accuracy and lead to better estimates of road haulage movements.’ As yet, the model has not been tested in practice. Ma: ‘If we want to work more efficiently with big data, we must answer questions about data management, data models, evaluation and visualisation. My investigation is primarily aimed at two important components: models and evaluation.’
Co-operation with universities
Chris de Blois, researcher at CBS, coached Yinyi Ma during her PhD research track. ‘Our department was very interested in the possibilities these new data sources have to offer for traffic and transport.’ Chris emphasised that the PhD research track also led to closer and more frequent contacts between CBS and Erasmus University Rotterdam. ‘A nice spin-off’, says De Blois. ‘In future, we will benefit from the intensified contacts between our institutions. After all, CBS has a vast amount of data, but sometimes grapples with lack of research capacity. In the academic world, it is the other way round. We are complementary.’
New step in working with big data
Big data developments are subject to permanent change. CBS now makes statistics about road traffic intensities purely on the basis of road sensors. Combining such new data with traditional data creates better opportunities for analysis. This is a promising development. ‘A new step in dealing with big data’, De Blois says. ‘For us, Ma’s PhD research work is a source of inspiration. The validation of figures generated on the basis of big data is a point of particular interest for CBS.’
During the first years of her PhD track, Yinyi Ma worked two days a week at the CBS office in Heerlen. ‘In that period, I was in close contact with my supervisor Chris de Blois and CBS publications on this subject proved to be very useful. I was also given the opportunity to attend statistics courses at Eurostat, the European Statistical Office. My CBS colleagues were very involved and helpful, a truly positive experience.’ Yinyi Ma took her PhD degree on 3 June at Erasmus University, Rotterdam School of Management. She is currently employed with IBM in the United States.