The case of Large Language Models at Statistics Netherlands

Preliminary research into in the application of Large Language Models in Official Statistics.

Large Language Models (LLMs), such as GPT-5 by openAI, along with earlier pretrained architectures like BERT by Google, have revolutionized the field of Natural Language Processing (NLP). These models, capable of generating, translating, summarizing, and answering questions based on text prompts, have garnered immense interest across various sectors, including government and industry. Like many other organizations, Statistics Netherlands (SN) explored the potential of LLMs. The goal of this report is to report how we dealt with innovation and the pace of change with respect to LLMs. Research was conducted to assess whether LLMs are merely a hype or genuinely useful for responsible usage at Statistics Netherlands. This report outlines these findings. The research had four objectives. The first objective aimed to demystify the algorithms behind LLMs. The second objective sought to classify different usage types of LLMs within SN, also identifying the complexities and risks associated with each type of usage. A usage type can be defined as a cluster of usages that share similar properties. The deliverables were training materials and (high level) guidelines for responsible usage of LLMs. These deliverables served a single goal, namely to raise Artificial Intelligence (AI) Acceptance within an organisation that is comfortable with traditional statistical methods but new to AI/LLMs. The third objective addressed the feasibility of implementing LLMs on SN’s internal IT infrastructure. The study focused exclusively on open-source LLMs that can be securely used internally, adhering to government recommendations. Pilot demonstrations were implemented for usage-types that were expected to have a broad impact at SN. The final objective focused on cataloging existing LLM projects at SN and proposing new projects to enhance operational efficiency. The deliverables were hands-on demonstrations, with the goal of engaging in practical, hands-on experience. We concluded that the application of LLMs may hold significant promise for enhancing the efficiency and effectiveness of SN’s statistical processes and operations. By understanding the underlying technologies, identifying practical applications, and adhering to responsible AI guidelines, SN can safely experiment with LLMs to improve its services while mitigating associated risks. We advise that future efforts should focus on knowledge dissemination, targeted training (or acquisition) of personal, strategic positioning of SN in the landscape of generative AI, and above all gain practical experience with LLMs. In terms of concrete projects with LLMs, it was advised to roll out a chat- and code-assistant for all SNs employees, in particular for standard business operations. Additionally, it was advised to (continue) research into more niche applications with LLMs.

Ponsen, M., M. Puts, V. Toepoel (2025). How to cope with innovation and the pace of change: The case of Large Language Models at Statistics Netherlands. Discussion paper, Statistics Netherlands, The Hague/Heerlen.