ProcedureThe text on the website’s homepage is used to decide whether a business is or is not innovative. Punctuation marks and common general terms are removed from the text on each site, and the remaining words form the initial dataset for the development of an algorithm that can distinguish between innovative and non-innovative businesses. Because we know which of the businesses in the CBS innovation survey are innovative and which are not, we use the websites of these larger companies to train the algorithm. Ultimately, this produces a list of words that are important when classifying innovation, such as ‘technology’, ‘new product’, ‘innovation’ and ‘software’. The language in which the website is presented is another important indicator. A company whose website is in English is statistically more likely to be innovative than a business with a Dutch website. Some words actually indicate that a business is not innovative; these include ‘shop’, ‘transport’, restaurant’ and ‘service’. Of course this does not necessarily mean that a shop can never be innovative; the combination of the other words on the website is also relevant. The latest version of the algorithm has been shown to be able to identify the innovativeness of large companies’ websites with 93% accuracy.
The next step was to select half a million companies with fewer than ten employees from CBS’ business register. The text of these companies’ websites was then collected and classified using the algorithm. We did not know in advance whether these businesses were innovative or not, but a prediction was made based on the algorithm’s results. A manual check of a large section of the results confirmed that the algorithm also works well on small companies’ websites. Its functionality was also checked against the Innovation Top 100 for small and medium-sized enterprises (SME) and using the websites of start-ups. In both cases, the algorithm proved to be able to accurately classify a very large number of businesses as innovative. The approach that had been developed worked especially well in relation to companies with a high level of technological innovation. Our initial findings indicate that more than a third of the 500,000 websites can be classified as innovative.
The maps show information about more than half a million small businesses, displayed at the level of the province and the municipality. The businesses’ post codes were used to achieve this. The provinces with the most innovative businesses, both small and large, are Noord-Holland, Zuid-Holland and Noord-Brabant.
However, especially in comparison with the larger companies, slightly more small innovative businesses were found in the other provinces. Good-quality data was previously lacking about this group. The new method that has been developed makes it possible to draw up more detailed maps, for instance at a municipal level, revealing the areas of the Netherlands with a relatively large number of small innovative businesses.
These areas are mainly to be found in the large cities, particularly Amsterdam and Rotterdam, and in municipalities with universities and universities of applied sciences, especially technical universities. Please note that the maps show the absolute number of companies, and therefore do not indicate how many people are employed at these companies. In other words, a tiny innovative start-up with a single worker counts for the same as a company with nine employees. Neither does this study look at the amount of investment made in innovation.