Classifying businesses by economic activity
We evaluated a number of methodical aspects of the machine learning techniques: different types of feature selections, different word-weighting methods and different classifiers. Further, we varied in the conditions to which we applied the text mining, for instance we compared the performance of one-man businesses versus those of larger businesses.
We obtained an accuracy of 51% for our best performing method at top-sector level while that for sub-sector level was much smaller. In the discussion, we present several ideas to improve the performance.