Automatically Coding Occupation Titles to a Standard Occupation Classification
Occupation Coding is the process of classifying job titles into one or multiple categories that are usually organized into a hierarchy. Historically, the task of classifying job titles to standard classifications was done manually. However, the drawbacks of manual coding have led researchers to develop automatic methods for occupation coding. We compare the classic machine learning approaches and the deep learning approaches on classifying job titles to Standard Occupational Classification (SOC). We implement flat and hierarchical models using Naïve Bayes, Maximum Entropy (MaxEnt), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) to code job titles to SOC. For this purpose, 65,962 SOC labeled job titles are collected from publicly available sources. These job titles are extremely short with an average of three words per job title. Our experimental results show that MaxEnt, SVM, and CNN perform similarly and are better than Naïve Bayes on coding job titles to SOC.