CBS launching Center for Big Data Statistics

/ Author: Miriam van der Sangen
Statistics Netherlands (CBS) is focusing more and more on the use of big data for official statistics production. One step which is intended to further shape this ambition is the official launch on 27 September 2016 of CBS’ new Center for Big Data Statistics (CBDS). The launch is taking place during the official trade mission to South Korea led by Dutch Prime Minister Mark Rutte and State Secretary for Economic Affairs Martijn van Dam. During this visit, a bilateral cooperation agreement on big data will be signed with Statistics Korea (KOSTAT). The official statistics agency is investing heavily in big data. At Statistics Netherlands, programme director Magchiel van Meeteren, methodologist/data scientist Piet Daas and innovation manager Barteld Braaksma explain CBS’s ambition to become the undisputed global leader in the field of big data statistics.

Faster, more timely and more detailed

CBS has been using big data in its production of statistics for a number of years. One of the results was that in mid-2015, CBS became the first statistics bureau in the world to launch official traffic statistics produced with big data. A major advantage of using big data is that it can result in faster and more timely production and more detailed statistics. Van Meeteren: ‘CBS plays a leading role in the field of big data for official statistics. It is now time to bring together all disciplines in this area, speed up efforts and present ourselves with a clear identity. This is the reason why we are launching the Center for Big Data Statistics (CBDS).’ Physically, the CBDS will be operating from two different office locations which are connected in real time, with the main focus on the CBS office in Heerlen. There, CBS researchers and data scientists will be working on new statistics using big data in cooperation with PhD students, university students and experts associated with various national and international parties.

New techniques

As Van Meeteren explains, CBS’ objectives are threefold: ‘First of all, to realise faster production of our statistics: real-time statistics. This will enhance our responses to our society’s need to receive usable information more quickly. A second objective is for existing statistics to become available at a lower aggregation level, more in particular those with data on regional and urban areas. In addition, big data offers opportunities to make statistics production more flexible and to formulate new indicators. Finally, we want to work based on the zero footprint concept; this means reducing the administrative burden at companies and for individuals further by deploying new sources.’ Van Meeteren expresses the expectation that implementing big data in statistics will also lead to improvements in efficiency and quality.

Quality of the figures

Piet Daas is a senior methodologist and data scientist at CBS. In 2009, Daas and his colleagues started searching for new possible ways to unlock data sources such as the internet, measurements by smartphones and other large and complex sources for the purpose of statistics production. The culmination of this work was the launch in mid-2015 of the first official big data statistics on traffic loops, a world’s first. Among recent projects is a study on the significance of the Dutch internet economy. In this study, data from standard CBS statistics on companies are combined with website data collected and processed by the company DataProvider, with contributions from Google. The first results of this study will become available early next month. According to Daas, not only are the vast amounts of data presenting a major challenge where big data is concerned, but quality also plays a crucial role. ‘CBS stands out from other organisations due to the high quality of its figures. We want to maintain this quality standard, but are aware of the fact that this must also apply to big data. Given the unstable nature of such data sources, it is one of the points deserving extra attention.’

New ways of working and thinking

Daas is looking forward to working at the Center for Big Data Statistics. ‘As we are beginning to work together with external parties, this creates more possibilities than before to open up and research different data sources. It is a very heterogenous group of partners, which is interesting, since that is how you can complement and sustain each other.’ Working with big data takes a whole different approach from compiling statistics in the traditional way. Together with external parties and our own data scientists, we are creating new methods and techniques which call for a new way of thinking. However, it is important to involve the departments within CBS with knowledge on the subject, as we have found during the development of new statistics based on traffic loop data. New facilities are also needed in IT. Hence, CBS has started to use a Spark cluster, equipment which allows for rapid analysis of large amounts of data.’ A point raised by Daas concerns the important issue of privacy and big data. ‘We have taken adequate measures within CBS. The work is entirely executed within the highly secure CBS environment. ’

Innovative external partners

Barteld Braaksma is the innovation manager at CBS. One of his tasks involved approaching national and international partners for cooperation with the CBDS. ‘Partners are acting quickly and with enthusiasm. Seven different national statistical institutes (NSIs) have already signed up, and the Statistical Office of the European Community (Eurostat). We do not only focus on statistical agencies, but also on renowned innovative external partners from the private or the public sector, for instance TNO, DNB, IBM, KPN and SURFsara. Furthermore, a large number of universities and colleges of higher education have joined, from Maastricht to Leiden and from Twente to Amsterdam.’ CBS’ level of ambition regarding the Center for Big Data Statistics is high, says Braaksma. ‘These are ambitions we must fulfil by ourselves. Not just by working with collaborative partners, but also within all our internal departments. Furthermore, it is critical that we collect input from usable datasets.’ More and more big data sources are becoming available at companies, institutes and authorities. Access to these sources is essential in order to realise our ambitions.’ Most parties are ready to cooperate with CBS by supplying the data anonymously.  

Which parties are joining the CBDS?

A long list of national and international organisations has endorsed the Center for Big Data Statistics. These organisations will provide knowledge and expertise which is needed in order to jointly achieve various social objectives. Relevant experiences are being re-used, technical solutions being shared. One of the associated parties is Capgemini Nederland. Pieter Nieuwenboer, Head Insights and Data Netherlands explains: ‘CBS is in a unique position to further develop the domain of big data. Capgemini Nederland will be pleased to contribute to this development by offering its knowledge on big data and topics such as security and mobility.’ Another enthusiastic response is by Jeannine Peek, Director of Dell-EMC: ‘We are thrilled to cooperate with CBS at the Center for Big Data Statistics and produce new statistical products and services. Dell-EMC and Pivotal are bringing in expertise on data lakes, cloud native platforms and microservices on the basis of realised solutions in automotive, health care, finance and public services.’ Another partner is Microsoft Netherlands. According to General Manager Ernst-Jan Stigter, ‘Around the world, Microsoft sees formidable opportunities in the mobilisation of big data for societal purposes. We are looking forward to working with CBS in order to embrace these opportunities in the Netherlands. The Center for Big Data Statistics will play a key role in this endeavour.’ In addition, Humanity X stands behind the CBS initiative. Co-founder Humanity X is Dr Ulrich Mans: ‘Humanity X is a joint initiative by the Centre for Innovation of Leiden University, the city of The Hague and various other partners including universities, NGOs and IT companies. We provide support in the area of data-driven innovations to tackle global challenges and we will be working closely with the new Center for Big Data Statistics. We are thus joining a growing global network of innovators who want to benefit from the data revolution in order to achieve targets related to sustainable development (SDGs).’