Tuesday, May 16, 2023

HPCC Summer Internship - 2023

 

With today's increasing digitalization, it has been observed that the vast majority of processes are becoming automated. One such occurrence that has been noted is the hiring of applicants for various positions in companies. Due to the overwhelming volume of resumes received for identical jobs and positions, recruiting has become time-consuming and challenging. Consequently, there is a growing demand for technologies that can automate this procedure so that resumes may be screened, sorted, and prioritized. Depending on the position and the applicant resumes typically contain a lot of unstructured content and might take on many formats.


I aim to prototype a system that will structure the data acquired from resumes and extract critical information, allowing the candidates to be classified based on location, age, experience, and skill set. NLP++ is a language designed specifically for Natural Language Processing. General NLP models help to segment text but do not perform linguistic analysis which leads to loss of information. 


The proposed work utilizes NLP++ to create a custom analyzer that helps analyze each and every statement in a resume and accordingly performs segmentation and classification on the data. Then the segmented data (generally a parse tree) will be parsed (This is done by applying a series of NLP++ Analyzer passes) to extract information with the help of NLP++ rules and functions. The general information that can be extracted is - Name, Email, Qualifications, Skills, Location, etc.


A Knowledge base will also be built for the different phrasings/components that pertain to a resume. These can be assessed and quantified by parsing candidates' personal statements, aims, and experiences. This system may also keep track of candidate portfolios by providing an extensive view of the data.


  With today's increasing digitalization, it has been observed that the vast majority of processes are becoming automated. One such occu...