Big data technology
The cooperation with KOSTAT in the field of big data originates from September 2016. 'Director General Tjin-A-Tsoi of Statistics Netherlands signed a Memorandum of Understanding (MoU) with KOSTAT for collaboration in the Center for Big Data Statistics (CBDS) from Statistics Netherlands. The CBDS sees big data technology as an opportunity to innovate its statistics, together with renowned national and international partners. One of the agreements that resulted from the MoU was that a KOSTAT employee, Hae-Ryun Kim, would be posted to the CBDS for a longer period of time,' says CBS innovation manager Barteld Braaksma. 'My proposal then was to tackle the internet economy as one of the research themes. In 2016 - in collaboration with Google and Dataprovider - Statistics Netherlands carried out innovative big data research into the size and nature of the Dutch internet economy. We have mapped out which Dutch companies are active via the internet and how important their online presence is for them.’ A challenging job, because both the method of combining data and part of the used data sources were new. Within Statistics Netherlands expertise is being developed to explore by example web scraping and other big data technology.
New data sources
Researcher Hyun-Joon Jung and Director of Data Science Yong-Chan Jung are both employed at KISDI. This government-affiliated institute makes an important contribution to the realisation of IT policy and the development of the national economy in South Korea. It also carries out research commissioned by KOSTAT. Hyun-Joon Jung and his colleague show great interest in measuring the internet economy. 'In our time, the internet is crucial and it is also becoming increasingly important for the South Korean economy. Some international studies have been conducted into measuring the internet economy, but the special thing about Statistics Netherlands’ research is the use of the strengths of a wide range of sources. Use is made of web scraping (computer technology where software is used to retrieve and analyse information from web pages, ed.), administrative data and data from business surveys. With clever matching techniques, these sources are combined to create a coherent image. Because the changes in the field of the internet economy are taking place at a tremendous speed, it is important to use both big data and administrative data. It is precisely this combination of new and traditional data sources that provides added value.’
Linking websites to companies
Hyun-Joon Jung and his colleague followed an intensive 3-day programme, in which CBS colleagues showed all kinds of facets of the internet economy research. During that programme, Magda Slootbeek, statistical researcher, explained the significance of the General Business Register (GBR) of Statistics Netherlands. ‘In the research into the internet economy, we linked the websites of the companies that are active on the internet to the GBR. Based on the content of these websites, we have determined how companies use the internet. That yielded interesting information. In total, some 50,000 companies form the core of the internet economy in our country. These include web shops, online services and internet-related ICT companies. Surprisingly, two-thirds of the companies did not have a website.’
According to Braaksma, not only KOSTAT is interested in research into the internet economy, but also, for example, the German, Irish and Mexican statistical offices as well as international organisations like Eurostat, OECD and IMF. ‘It would be very interesting if Statistics Netherlands could regularly publish data on the internet economy instead of once, but that is dependent on availability of resources. There are also various possibilities to further expand the research, for example with features that say something about the security of websites against cyber attacks.’ According to Braaksma, KISDI now has signed a contract with Dataprovider (a Dutch company that collects and structures various attributes of companies in dozens of countries via the internet, ed.) and is starting to work with the datasets supplied by Dataprovider to map the internet economy of South Korea. ‘Statistics Netherlands research forms the basis for this.’