WEEK 6

 

DAY 1&2:

Today I started by improving the education analyzer. I noticed that it didn't deal with designated names like Dr. B.R. ----.  I, therefore, wrote a rule for dealing with this (and other designations).
The next problem I saw was that it had a problem with "'s" because tokenization separated these into tokens. So I modified the functions written to deal with this. If the node is small s, we check if is preceded by \' and then include it in the name sequence. On running it on examples, it worked well. Here is an example:


Next, I went back to the main analyzer. I found some inaccuracies. Firstly the header section started showing mistakes. It started isolating any long sequence of capitalized words as headers. Then I put a rule that it had to be only alphabetical. But this rule failed on another example resume. Therefore, I made a best-fit rule for this pass. Running it on other resumes, I got the correct output so I kept it.

DAY 3&4:

I made a different copy of the dates analyzer for both _LINE and _PROSE. 
As mentioned earlier, although @MULTI works well, it goes through all nodes at all levels under it which was not giving me the desirable output.
Therefore I had to make additional copies for both.

Next, I started working on the knowledge base. It is an important part of the resume analyzer as it takes information and puts it in a hierarchical order which gives a comprehensive picture of the individual. 

To start with, I initially thought of separate analyzers for bullets, prose and the header zones. But I discarded the idea and felt the structuring of the KB should be in the form of header zones, followed by the information under each zone.
I was able to get the zones and also put initial information like the telephone number, links used in different sections. 

I started next with dates, but it is not straightforward as durations(date ranges) should be mentioned along with what text they are talking about. 
Example:
In education, an individual may mention more than one institution and all time spent there. So this has to be taken care of.

DAY 5:

I continued building the knowledge base by adding functions for doing this. For the education part, I made use of some logic and a global variable for continuously adding information, for each subcomponent.  With this, I was able to add grades and duration of each education subsection separately. 

Then I integrated my education analyzer subcomponent with my main analyzer. 
But before this,  I wrote a rule and function to look for the city address of the institution. I had some small problems but these were easily dealt with. Care must be specially taken for changing the @PATHS properly when this is being done. Now since I was also able to get the institute name, I added that also to the knowledge base as well using the same logic as the previous one.  When I ran this, I was able to get the correct outputs for now.
Here are some outputs: 












No comments:

Post a Comment

  With today's increasing digitalization, it has been observed that the vast majority of processes are becoming automated. One such occu...