Analysis of big data is becoming ever more important for governments to facilitate effective, sustainable policy choices in many different areas. For this reason, Statistics Netherlands (CBS) established the Center for Big Data Statistics (CBDS) in September 2016. The CBDS works with national and international partners from the public and private sector, the scientific community and education towards implementing big data technology and methods in the production of statistics. One such partner is Fontys Hogeschool ICT in Eindhoven. Two groups of data science students from this University of Applied Sciences recently worked on assignments on behalf of the CBDS.
Gerard Schouten has been a professor in big data analytics at Fontys Hogeschool ICT as of January 2016. He graduated in Physics from Eindhoven University of Technology and conducted doctoral research on visual perception. Since then, he has worked as a senior scientist and in various other positions at Philips. ‘With the new Applied Data Science minor we launched in 2015, we want to pique students’ interest in big data and provide challenging, up-to-date input into education. We started out with 40 students in Eindhoven; last year, we had 60 students in both Eindhoven and Tilburg. From September this year onwards, we will develop this minor further into a completely new specialisation within Fontys Hogeschool ICT: Applied Data Science.’
The current Applied Data Science minor at Fontys Hogeschool ICT focuses mainly on learning how to collect, store and clean up data. These could be open data as well as CBS data. Other core elements of the study programme include machine learning and data visualisation. ‘It’s exciting to learn that data can contain much more valuable information than we might initially think. After performing some calculations, interesting patterns may appear.’ The study programme also focuses on data ethics, and students participate in workshops on social physics to study the behaviour of large groups of people. ‘Students follow courses on these subjects for a period of six months. In addition, they put everything they have learned directly into practice as they perform case studies for companies.’
Technology and communication
What are the qualities data science students need to possess? ‘They must be thoroughly interested in technology, but also have strong communication skills. These skills are required in the initial stage of a big data project, when the research question is being defined; and in the final stage when they need to be able to present the conclusions effectively’, says Schouten. He adds that acquiring company assignments for students is relatively easy: ‘Most companies – from ASML to Philips – are very keen and come up with exciting case studies. The intensive cooperation is very interesting to both parties. Through the students, companies are introduced to new technologies, while the students gain practical experience during their traineeship.’ Schouten is very positive about the assignments students have carried out for CBDS. ‘It has been a success on both ends. CBS will participate again next year.’
Sam Jansen is a fourth-year student in software engineering at Fontys Hogeschool and about to graduate. He chose the Applied Data Science minor because of his interest in big data. ‘It was quite a vague concept to me and I wanted to know more about it.’ The CBDS provided an interesting assignment to his group. ‘From an archive containing some 60 million websites, we had to extract the Dutch websites and then select the companies that have a webshop but no physical shop. We first carried out the assignment on a small scale using machine learning algorithms. It took some time at first but it got better eventually.’ In performing his assignment, Jansen made use of data provided by Common Crawl, a non-profit organisation that crawls the web and freely provides its archives containing millions of websites and datasets to the public.
After their research, the students concluded that 50 thousand out of the 200 thousand selected Dutch websites were webshops; slightly under 13 thousand of them had no physical location. ‘We wanted to carry out this assignment with the knowledge already available, but at the same time we wanted to seek new ways to tackle it; that’s why we included working with Google Maps.’ Jansen is very positive about the coaching provided by lead data scientist Piet Daas, who is connected to CBS as well as CBDS: ‘We could always ask him questions, and he arranged a CBS lecture on big data for us, with lots of great tips.’ Daas, too, is very pleased about the collaboration with Fontys Hogeschool ICT: ‘I really enjoyed coaching these student groups. The results from both assignments offer many possibilities for the future. Both the research on webshops and the research on websites of innovative companies gave us results that provide a good basis for the production of new statistics based on big data.’