Quinten Meertens’ breakthrough in how we think about bias
/ Author: Miriam van der Sangen
‘A breakthrough in how we think about bias’ – that is how professor Jaap van den Herik at Leiden University described Statistics Netherlands (CBS) researcher Quinten Meertens’ PhD thesis ‘Misclassification Bias in Statistical Learning’. Meertens’ research focused on ways to correct statistical bias in machine learning algorithms. His methodology is now integrated in CBS’ practices, and it has the potential to produce very relevant results for other statistical organisations, too.
After graduating in mathematics from the University of Amsterdam, Meertens joined CBS in 2015 as a statistical researcher in economic business statistics. He was regularly parachuted into other teams and worked on many different projects. He was appointed Project Leader for Online Trade in 2016. ‘I investigated how to estimate online purchases of goods from overseas,’ says Meertens. ‘My colleagues had already done several studies on that topic, but without finding a way to produce accurate estimates.’ Meertens immersed himself in the topic and achieved an initial estimate through webscraping and classification algorithms, which are used extensively in machine learning. That success earned him his division’s innovation prize. Towards the end of 2016, this talented mathematician decided to conduct PhD research in the same area.
Meertens spent the first two years of his PhD research exploring the research question: which data and methods are necessary to measure online purchases outside the Netherlands accurately? Over the course of his research, he discovered that the use of classification algorithms when producing official statistics can cause bias. Defending his thesis on 28 April 2021 at the University of Amsterdam, he explained this bias using a topical example: coronavirus infections in the Netherlands. ‘When deciding to impose and ease restrictions, the number of infections is important,’ Meertens explains, ‘not specifically who is infected. The outcome is based on PCR tests, but they do not always produce accurate results. As a consequence, adding up all the test results, the total outcome is incorrect: that is misclassification bias. The question is: how do you adjust for this bias?’
A vast body of literature
Many reliable methods have been developed since the 1950s to correct for bias, says Meertens. ‘I collected a vast body of literature and compared different adjustment methods. We looked at two situations and were able to show which of the existing correction methods was the best. The first situation is the one-off production of a statistic based on classification algorithms. The “calibration estimator” is best for that situation. The second situation involves producing statistics over a longer period of time, resulting in time series. We showed that, under certain conditions, the “misclassification estimator” is the better method.’ Meertens believes it is important to use these correction methods for official statistics. ‘CBS actually already does this in practice, for example in statistics about solar panels, innovative companies and cybercrime.’
Meertens’ PhD supervisors Jaap van den Herik and Cees Diks showered praise on the PhD candidate’s thesis. ‘This thesis brings together different research fields – econometrics, mathematics and computer science – in a unique way,’ explains Van den Herik, professor in Law and IT within the Faculty of Science at Leiden University. ‘The result of the study is a breakthrough in how we think about bias. Meertens has introduced the topic of bias to a new multidisciplinary field, through the scrupulous study and analysis of theory, methodology and techniques from the various fields studied.’
Cees Diks, professor in Data Analysis and Economic Statistics at the University of Amsterdam, stresses the relevance of Meertens’ PhD thesis to CBS and other statistical organisations. ‘They generate official statistics every day, and they are making more and more use of machine learning to do that. The direct practical application of the methodology Meertens used is supported by sophisticated theoretical aspects, such as the imposition of an upper limit for bias.’
During his research, Meertens benefited from valuable feedback from his supervisors and his co-supervisor Frank Takes. His CBS colleagues were also helpful as sparring partners, as they come up against the same issues in their work. ‘It was amazing the way my colleagues collaborated with me,’ Meertens remembers. ‘CBS also let me work on my thesis during office hours. I was really happy about that, because it helped me make rapid progress.’ Armed with his PhD, Meertens is continuing to work at CBS as head of Justice and Safety Statistics whilst also retaining his affiliation with the University of Amsterdam, as research fellow at CeNDEF. ‘It provides me the opportunity to supervise students in their research, for example in the role of co-supervisor in the PhD research by Kevin Kloos that is now building on my thesis.’