3. AI-producing companies
To describe the characteristics of companies in the Netherlands that produce AI, it is essential to know which companies make up that population. Although some sub-populations of AI companies are available12), there is not yet a complete overview of AI-producing companies. Not only that, but these sub-populations are also compiled using ambiguous definitions.
In 2021, Statistics Netherlands (CBS), together with the company InnovatieSpotter, conducted an exploratory study that examined whether it was possible to identify AI companies based on the texts on their websites. Within this study, it was not possible to reliably estimate the number of AI companies. This was partly because a (too) broad operationalisation had been used in compositing the training set of AI companies, which meant that the models developed could not reliably distinguish between AI companies and non-AI companies. But since that first study, great strides have been made in modelling the identification of business types based on their website texts. Examples include the identification of innovative companies13), online platforms14), companies in the creative sector, and drone companies15).
The current project goal is to use new web scraping and machine learning techniques to map the population of AI companies in the Netherlands. The project specifically targets companies that produce AI systems. Section 3.1 of this chapter describes the methodology used, section 3.2 provides a statistical description of the identified AI companies, and section 3.3 provides a methodological review of the method used.
3.1 Method
This section describes the method by which the population of AI firms was composed.
3.1.1 Conceptual delineation of AI companies
To identify AI companies, it is essential to first define what an AI system is, what an AI company is, and how to assess whether a company is an AI company in practice.
Definition AI system
This project uses the most recent definition of an AI system as established by the Organization for Economic Cooperation and Development (OECD) in 2023. CBS is thus consistent with the definition of an AI system in the European AI Regulation (2024/1689). According to this definition, an AI system is: 'a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.’ A non-exhaustive set of examples of AI systems under this definition include: autonomous robots, self-driving cars, machine learning models used for data analysis, AI-driven image analysis, and generative AI models that produce prompt-based text and/or images.
Definition of AI company
An AI company is defined as: 'A company whose main activity is the production of AI systems.' Thus, according to this definition, a company that merely uses AI systems is not an AI company. In contrast, a company that focuses primarily on the production of AI systems is an AI company. The key question here is whether the company would not exist, or would be substantially different in nature, if it did not produce AI systems.
Operationalisation of AI company
A company is considered an ‘AI company’ if its website shows that it meets the definition of an AI company. It is impossible to avoid at least some subjectivity in manual assessment. This is especially true for the question of whether a company would not exist, or would be substantially different, if it did not produce AI systems.
3.1.2 Population and training set
To identify company websites, a very large list of websites16) was linked to the CBS business registry in the first quarter of 2024. This yielded more than 713 thousand websites linked to businesses in the Netherlands. These websites constituted the population of business websites in the Netherlands. For each website, we then scraped up to 200 webpages. A training set of manually classified websites of AI companies and non-AI companies was compiled in the third quarter of 2024. The positive classifications consisted of websites of companies participating in the Dutch AI Coalition (NL AIC) or those that had received a grant through the AiNed Investment Programme, supplemented with the websites of AI companies found through a Google search. One condition was that all of these companies met the criteria for the operationalisation of AI companies. The negative classifications consisted of company websites from the aforementioned sources that did not meet the standard of operationalisation of an AI company, supplemented by a sample of non-AI company websites. To further extend the training set, we used it to train a logistic regression model and applied that to the population of corporate websites. The websites with a high probability of belonging to an AI company were reviewed and added to the training set. This yielded a final training set of 294 websites of AI companies and 2 436 websites of non-AI companies.
3.1.3 Development of the model
Different types of models were trained on the final training set. Of these models, the random forest model performed best and was therefore applied to the population of corporate websites. However, the first random forest model still produced many false positives. A manual review of 200 random potential AI websites showed that only four were actually from AI companies. In an effort to reduce the rate of false positives, six more random forest models were trained on training sets that varied in the number of positive and negative classifications. For the negative classifications, this involved an additional distinction between websites that did or did not have characteristics of websites of AI companies.
The scientific literature describes positive results for the combined use of (multiple) models. After all, each model makes classifications based on different properties17). Because a combination of models was expected to be better able to distinguish AI- from non-AI websites, the choice was made to use a combination. The choice of specific models in the combination was made by evaluating which (combinations of) models yielded the smallest possible subset of potential AI websites, which did include as many known websites of AI companies as possible. A website was then identified as a ‘potential AI website’ if it was given a high probability of belonging to an AI company by at least six of the seven models. This resulted in a list of 1,281 potential AI websites, which included more than 90 percent of websites that had already been identified. After manually reviewing this list, 293 were actually found to belong to an AI company. Combined with the websites in the training set, this yielded a population of 450 AI company websites.
3.2 Results: statistical description of AI companies
This section describes some demographic and financial characteristics of the identified AI companies. For this purpose, the websites of AI companies (Section 3.1.3) were linked to Statistics Netherlands’ business register and enriched with revenue data and other financial information for the respective companies. At the time of analysis, revenue data were available until 2023; other financial information was only available until 2022. The 450 websites belonged to 402 different AI companies operating in the Netherlands in 2024. This section provides some key findings. A more detailed statistical description of these AI companies can be found in the table set accompanying this chapter. All figures are provisional, as the methodology used could be improved in the future (see Section 3.3).
3.2.1 Demographic characteristics
Of the AI companies, 97 percent belonged to small and medium-sized enterprises in 202418). Only 3 percent of the AI companies had 250 or more employees.
| Type | ≤ 1 employee (% of companies) | 2 - 9 employees (% of companies) | 10 - 99 employees (% of companies) | 100 - 249 employees (% of companies) | ≥ 250 employees (% of companies) |
|---|---|---|---|---|---|
| AI companies | 35.3 | 33.3 | 24.4 | 4.2 | 2.7 |
| All companies | 82.6 | 14.5 | 2.6 | 0.2 | 0.1 |
| * provisional figures | |||||
Most AI companies belonged to sector J Information and communication industry (63 percent). Other common industries were sectors M Specialised business services (23 percent) and C Manufacturing (5 percent).
| Type | Manufacturing (% of companies) | Trade (% of companies) | J Information and communication (% of companies) | M Specialist business services (% of companies) | Other SIC groups (% of companies) |
|---|---|---|---|---|---|
| AI companies | 4.7 | 3.0 | 63.4 | 23.1 | 5.7 |
| All companies | 3.6 | 12.7 | 5.4 | 20.3 | 57.9 |
| * provisional figures | |||||
In most cases, an AI company was a private company under Dutch law (83 percent). A smaller proportion were sole proprietors (12 percent) or had some other legal form (5 percent).
| Type | Sole proprietors (% of companies) | Private companies (% of companies) | Other (% of companies) |
|---|---|---|---|
| AI companies | 11.9 | 82.8 | 5.2 |
| All companies | 68.8 | 19.8 | 11.4 |
| * provisional figures | |||
Nearly half of the AI companies (47 percent) had been founded within the past five years. About one in ten AI companies had existed for more than 15 years.
| Type | 2005-2009 (% of companies) | 2010-2014 (% of companies) | 2015-2019 (% of companies) | 2020-2024 (% of companies) |
|---|---|---|---|---|
| AI companies | 11.4 | 11.9 | 29.4 | 47.3 |
| All companies | 18.5 | 14.2 | 21.8 | 45.5 |
| * provisional figures | ||||
Most of the AI companies were located in the provinces of Noord-Holland (32 percent) and Zuid-Holland (24 percent). Utrecht (11 percent) and Noord-Brabant (10 percent) also had relatively high numbers of AI companies.
| Provincies | % AI companies (%) |
|---|---|
| Groningen | 3.0 |
| Frysl | |
| Drenthe | |
| Overijssel | 2.7 |
| Flevoland | |
| Gelderland | 7.2 |
| Utrecht | 11.4 |
| Noord-Holland | 32.1 |
| Zuid-Holland | 23.9 |
| Zeeland | |
| Noord-Brabant | 9.7 |
| Limburg | 2.2 |
| The figures show percentage shares. * provisional figures | |
Most AI companies (95 percent) were headquartered in the Netherlands. The remainder (5 percent) of the companies belonged to a parent company based abroad. A significant portion of this group (37 percent) belonged to a parent company from the United States.
3.2.2 Financial characteristics
This section describes some financial characteristics of AI firms, such as turnover, revenues, wages, operating expenses, operating results, and value added. Because a number of industries fall outside the observational domain of financial data, only the following industries are reported: wholesale and commission trade, transportation and storage, accommodation and food services, information and communication, specialised business services, other business services, and other services. Because of missing values, the revenues, wages, and operating expenses for some companies have been imputed. Additionally, due to a high number of missing values for smaller companies, operating results and the value added are only reported for businesses with ten or more employees. For more background and details, see the Explanation tab in the table set accompanying this chapter.
By 202319), the majority of AI companies (43 percent) had a turnover of between 100 thousand and 1 million euros. Compared to other companies in the Netherlands, AI companies were more likely to have higher turnover.
| Type | < 10 (% of companies) | 10 - 99 (% of companies) | 100 - 999 (% of companies) | 1,000 - 9,999 (% of companies) | 10,000 - 49,999 (% of companies) | ≥ 50,000 (% of companies) |
|---|---|---|---|---|---|---|
| AI companies | 12.6 | 17 | 42.9 | 18.1 | 5.5 | 3.8 |
| All companies | 35.0 | 36.6 | 22.3 | 5.1 | 0.8 | 0.3 |
| Revenue (x 1,000 euro)
* Provisional figures. Only revenues figures for 2023 were available. For companies that existed in 2023 and 2024, the revenue figure for 2023 is shown. | ||||||
Of the AI companies with at least one employee, the majority (85 percent) had revenues20) of less than 5 million euros in 202221). Only a small portion (5 percent) had revenues of more than 50 million euros. The remainder (10 percent) had revenues between 5 and 50 million euros.
| Opbrengsten | % of AI companies (% of AI companies) |
|---|---|
| < 5,000 | 85 |
| 5,000 – 9,999 | 3 |
| 10,000 – 49,999 | 7 |
| ≥ 50,000 | 5 |
| Profit (x 1,000 euro)
* Provisional figures. Only figures for 2022 were available. For companies that existed in 2022, 2023 and 2024, the figure for profit in 2022 is shown. | |
Of the AI companies with at least one employee, the majority (81 percent) had less than 5 million euros in operating expenses in 202222). About 7 percent had operating expenses in excess of 25 million euros. The remainder (13 percent) had revenues between 5 and 25 million euros.
| Lasten | % of AI companies (% of AI companies) |
|---|---|
| < 5,000 | 81 |
| 5,000 – 9,999 | 8 |
| 10,000 – 24,999 | 5 |
| ≥ 25,000 | 7 |
| Operating expenses (x 1,000 euro)
* Provisional figures. Only figures for 2022 were available. For companies that existed in 2022, 2023 and 2024, the figure for operating expenses in 2022 is shown. | |
Of the AI companies with at least one employee, the majority (94 percent) had less than 10 million euros in wage expenses in 202223). Only a small portion (6 percent) had revenues of 10 million euros or more.
More than half (55 percent) of AI companies with 50 or more employees had operating results24) of less than 2.5 million euros by 2022. Over one-third (37 percent) had higher operating results.
| Bedrijfsresultaat;% AI-bedrijven < 500;15 500 – 2 499;10 ≥ 2 500; | % of AI companies (% of AI companies) |
|---|---|
| < 500 | 34 |
| 500 – 2,499 | 21 |
| ≥ 2,500 | 37 |
| Unknown | 8 |
| Operating results (x 1,000 euro)
* Provisional figures. Only figures for 2022 were available. For companies that existed in 2022, 2023 and 2024, the operating result in 2022 is shown. | |
More than half (63 percent) of AI companies with 50 or more employees had value added25) of less than 23 million euros by 2022. Almost 30 percent had higher value added.
| Toegevoegde waarde | % of AI companies (% of AI companies) |
|---|---|
| < 10,000 | 29 |
| 10,000-22,999 | 34 |
| ≥ 23,000 | 29 |
| Unknown | 8 |
| Value added (x 1,000 euro)
* Provisional figures. Only figures for 2022 were available. For companies that existed in 2022, 2023 and 2024, the figure for added value in 2022 is shown. | |
3.3 Methodological review
This section provides a methodological review of the method used as described in Section 3.1.2. One question raised here is to what extent this method makes it possible to identify the entire population of AI companies in the Netherlands.
3.3.1 Use of website texts
The method we used identified AI companies based on website texts. According to figures from Statistics Netherlands (CBS), 86 percent of companies with two or more employees in the ICT sector had a website in 2024. Since it can be assumed that this is also true for AI companies, it is likely that the vast majority of those companies in the Netherlands have a website that can be used to identify the company as an AI-producing company. However, while the list of websites linked to the business register is very extensive, it did not include all websites of companies in the Netherlands. This was evidenced by the fact that nearly 140 of the AI websites found manually did not appear in the population of company websites. One reason may be that websites were created and/or companies were established after the link with the business registry was made (March 2024). In addition, when companies have a website that represents only a part of the company's activities, it is possible that the production of AI systems may be mistakenly identified as the main activity, and therefore the company may be mistakenly classified as an AI-producing company. It is unclear for how many companies this is the case. But despite these caveats, the method, which supplements existing sources with newly found AI-producing companies, is likely to provide adequate insight into the current population of those companies in the Netherlands. Since it is plausible that the most influential AI companies are affiliated with the Dutch NL AI Coalition and/or have received a grant through the AiNed Investment Programme in recent years, no significant companies are expected to be missing.
3.3.2 False positives and false negatives
A combination of random forest models proved most appropriate for identifying potential AI websites. The models in the combination did still yield a relatively large share of potential AI websites that turned out not to be from AI-producing companies after all (‘false positives’). These false positives were removed from the final population of AI companies through manual review. However, the combination of models may have also missed websites that did belong to AI-producing companies (‘false negatives’). Because those companies are only a (very) small part of the total company population in the Netherlands, it is rather difficult to determine the percentage of false negatives.
3.4 Conclusion and recommendations
This project used web scraping and machine learning to identify a population of 402 AI-producing companies operating in 2024. It is possible that some AI companies were not identified by the current method, but these are not likely to be large companies.
The current method could be improved in a number of ways in the future. One time-consuming step is the manual review of all potential AI websites. As the number of AI companies increases in the future, the assessment time required will also increase. It is therefore important to investigate whether it is possible to reduce the share of false positives. One starting point for this is to use the full set of AI websites identified in this project to train a new model. The linking of Dataprovider.com's list of websites to the business directory could also be improved. These improvements may help reduce the proportion of false negatives. Additionally, the assessment of whether a company is an AI-producing company is somewhat subjective (see Section 3.1.1). In this study, the manual assessments were done separately by two people. To increase the reliability of the ratings, future work could look at ways to increase consistency between the various classifications.
Finally, it is important to mention that AI is a constantly evolving phenomenon. As a result, new AI terms may emerge over time that are not recognised by the currently developed models. Additionally, common terms on websites of AI-producing companies could increase in use on websites of non-AI companies and vice versa. As a result, models that perform well now may not perform as well in the future. It is important to continue to evaluate the effectiveness of the developed models, and adjust them as needed.
12)This includes the list of participants in the Dutch AI Coalition (NL AIC), the European AI Startup Landscape of the NL AIC, or the list of companies that have received a grant through the AiNed Investment Programme.
13) Daas, P. J. H., & van der Doef, S. (2020). Detecting innovative companies via their website. Statistical Journal of the IAOS, 36(4), 1239-1251,
14) Daas, P., Tennekes, M., de Miguel, B., de Miguel, M., SantaMarina, V., & Carausu, F. (2022). Web intelligence for measuring emerging economic trends: the drone industry. (Statistical Working papers). Office for Official Publications of the EC,
15) Daas, P., Hassink, W., & Klijs, B. (2023). On the Validity of Using Webpage Texts to Identify the Target Population of a Survey: An Application to Detect Online Platforms. Journal of Official Statistics, 40(1), 190-211,
16) The list contained more than 7 million URLs and was supplied to CBS by the Dutch company Dataprovider.com.
17) Gubbels, L., Puts, M., Daas, P. (2024). Bias Correction in Machine Learning-based Classification of Rare Events. Presentation for Symposium on Data Science and Statistics (SDSS) 2024, Statistical Data Science track, Classification and Modeling session, Richmond, VA, USA,
18) Companies with fewer than 250 employees.
19) Revenue data was only available for 2023, which is why the descriptions refer to 2024 companies that were already in business in 2023.
20)The other financial features were only available for 2022 for AI companies with at least one worker, which is why the descriptions refer to 2024 companies that were already in business in 2022. Missing values for revenue, operating expenses, and wages were imputed using a linear regression model, based on revenue and number of employees (r2>0.8).
21)Revenues from actual business operations. This includes sales of goods and services, as well as the value of changes in inventory, production for company-internal use, and subsidies and damage claims.
22) Costs incurred to achieve operating income. These include the cost of sales, labour costs, depreciation of fixed assets, and other operating expenses.
23)Total wage costs of all employees on the payroll, after deducting sick pay and wage (cost) subsidies.
24) Operating returns minus operating expenses, or the result obtained from production activities.
25) The difference between output (basic prices) and intermediate consumption (excluding deductible VAT).