European countries working together on big data

© Hollandse Hoogte

On 23 and 24 February 2017, the first ESSnet Big Data Dissemination Workshop took place in the Bulgarian capital of Sofia. Within the ESSnet, national statistical offices affiliated to the European Statistical System (ESS) conduct joint research on the possibilities of implementing big data for official statistics production. The workshop was an opportunity to present the latest results in this area to a wider audience. It was a highly diverse group of participants: interested experts, representatives of Eurostat and other parts of the European Commission, international organisations such as the OECD and the United Nations, the scientific community and the Bulgarian private sector. Statistics Bulgaria proved itself to be an excellent host. 

Exchanging ideas

Galya Stateva, who works at the Bulgarian National Statistical Institute as a specialist in the field of big data, expresses enthusiasm about the results of the workshop. ‘The purpose of the workshop was to exchange ideas and present the results of the first ‘Specific grant agreement’. It was also a great opportunity to share experiences in the areas of methodology and technology.’ According to Stateva, the presence of Statistics Bulgaria Director General Sergey Tsvetarsky, acting Director General of Eurostat Mariana Kotzeva and CBS Director General Tjark Tjin-A-Tsoi was highly valuable. ‘This is a clear sign that the work of the ESSnet Big Data is supported at the highest level.’

Spearhead

How much progress has Statistics Bulgaria made in the implementation of big data for statistics? ‘We are an active partner in the international ESSnet Big Data project and we have contributed to the topics of ‘Webscraping enterprise characteristics’ and ‘Dissemination’. In the first project, we have been exploring the possibilities of webscraping, text mining and inference techniques for the purpose of collecting general company data. A spearhead for the coming years is the use of big data to study household expenditure. We are in contact with key retail chains in Bulgaria to obtain data on prices and sold products. This will be a valuable addition to the information on household expenditure that we receive through our official surveys. It will possibly improve current results.’ According to Stateva, the Bulgarian statistical office has only just started to apply big data in the production of statistics. ‘At the moment, we are mainly learning from the experiences of other statistical offices. However, our experts have already started to participate in international training courses.’

Maritime data

Aside from CBS’ Director General, various methodologists and researchers from the Dutch statistical office also attended the workshop in Sofia. This is because the production of new statistics based on big data is one of the spearheads for CBS. Peter Struijs, general coordinator of the ESSnet on behalf of CBS, has been closely involved from the start. ‘The ESSnet is a collaboration between 22 partners from 20 European countries. We do our work by means of grants (subsidies, ed.) and we discussed the results of the first grant during this workshop. An important conclusion was that we want to quickly enable the use of so-called AIS data, which show the position of vessels, in regular statistics at the European level. These are maritime data which are available in a standardised format for all countries and as such are well suited for international comparison.’

Collaboration

Struijs emphasises the importance of the ESSnet. ‘Various countries have already begun working with big data, but at European level this is still in its earliest stage. The ESSnet has enabled countries to work together, which is very efficient. There are several pilots. In each pilot, some five countries are represented. They focus on big data themes such as webscraping. For example, we would never have achieved such rapid progress with AIS data without joint research within the ESSnet by Denmark, Greece, the Netherlands and Norway. Another good example of collaboration is the Center for Big Data Statistics (CBDS), which was launched by CBS in September 2016. Over 40 national and international partners are affiliated, because together we achieve more.’

Access to big data

Marc Debusschere is big data coordinator at the Belgian statistical office and made a valuable contribution as chairman of the workshop. ‘The most important objective of the workshop in Sofia – presenting the results from the first phase of ESSnet to all participants – has certainly been met. Topics of discussion included: webscraping of job vacancies, webscraping of enterprise characteristics, smart energy meters, AIS data, mobile phone data, early estimates and combining multiple sources.’ After this initial phase, Debusschere believes, the technical, methodological and IT issues can be overcome, although there is still a lot of work to do. ‘Access to big data remains a big problem. Big data can often be found in enterprises, where the main objective is to make profit. Helping to compile official statistics is not a high priority for them, even if costs are low. It becomes clear now that voluntary partnerships are, in many cases, not sufficient to achieve sustainable statistical production. This means that we need legislative initiatives to be implemented in the very near future.’

Usability of mobile phone data

How much progress has the Belgian statistical office made with big data? ‘We are already using a number of big data sets obtained from enterprises or the internet. To calculate the consumer price index, scanner data of various retail chains are processed. These currently cover 70 percent of the total market, but this share will soon be increased to 90 percent. The reason for this increase is we are now also collecting data from chains selling consumer durables such as electronics. In addition, prices of e.g. flight tickets are being collected through webscraping, also for the consumer price index.’ Debusschere is proud of the fact that Statistics Belgium, as one of the first statistical offices in Europe, has access to recent and extensive mobile telephone data sets. ‘This has been realised as a result of a project we started at the end of 2015 in collaboration with Proximus – the largest network operator in Belgium – and Eurostat as well as the Joint Research Centre of the European Commission. During the past year, we conducted a number of pilot studies to explore the usability of mobile phone data in areas such as population, migration, mobility, transport and tourism. The results look promising, despite many methodological challenges. It is realistic to say that already in the course of 2017 we may realise the first regular statistical production based on mobile phone data.’