WEEK 4

  

DAY 1:

This week started with brainstorming my next part of the work. I had two options: 
Isolating the prose in the resume
In-detail analysis of components like Education, Skills.
On discussing this with my mentor in Monday's meeting, he told me to focus primarily on prose as it was harder and required more rework.
Therefore I started on this. 
For this, I observed the resumes I had to check how to best work on this
The following observations were made:(keeping in mind the text is already split into Lines)
  • If the _LINE starts with an uncapitalized word then it belongs to the previous chain of lines.
  • If the _LINE ends with a "-" or words like of, for, with, along, etc. the successive line belongs to this chain of lines.
  • If there are
    • Unclosed brackets (The bracket opens in one _LINE but closes in some other _LINE)
    • Unclosed quotations (the same condition as above)
Based on this I started working on the analyzer sequence


DAY 2:

Today, I continued on the analyzer sequence. 
  • The basic ideation I used for brackets(or quotations) is that I analyzed all the lines first and first looked for the proper bracket sequence. If a bracket opens and closes in the same line, I put it under a node _bracket.
  • After this, I looked at all the lines again for any unmatched open bracket, indicating the closing bracket must be in the following _LINE(s). Therefore those lines are gathered up under one _LINE. After this, I spliced the _bracket node.
  • The next thing I worked on was "-" and words like of, for, with, along, etc. which, if occurring at the end of a _LINE indicate the following line belongs to the prose(Irrespective of whether it has an uncapitalized word or not).
  • For the uncapitalized word, I added an attribute to the line indicating that it should be a part of the prose. 
  • Additionally, a _LINE ending with a "./?/!" also belongs to the previous sequence of lines After this, I gathered up all such lines under one prose.
This is a glimpse of the parse tree before and after the application of the analyzer:

BEFORE:


AFTER:



I also started building a dictionary of all programming languages.

DAY 3:

Today, I found some flaws with the bracket analyzer. I erroneously assumed that if the bracket opens on one line, it closes on the following lines. I then changed the analyzer for this where I found that the problem lay deeper and I didn't deal with other opening sequences like other opening and closing sequences like -\{ \" \[ etc. and also indented brackets. 
So I started the sequence accordingly. I built a dictionary for all with different values for attributes to know specifically which opens and closes so that the analyzer does not mistake - { text.... ] as a proper sequence.
But by doing this, I faced a lot of conflicting issues and the complexity was a lot.
Therefore, I put a pause on the indented brackets. 
I will discuss this with my mentor tomorrow.

DAY 4:

I commenced today with bullets. A bullet is any line with a character like(•, -, *, ○)in the beginning followed by any textual content
A bullet is generally all the text that lies between consecutive bullets or which has a full stop at the sentence end. This is something that I observed from the resumes collected and therefore I wrote the rules accordingly to gather the text up.




                

This helped gather some more lines up so that some spanning brackets or dates were also gathered.

DAY 5:

Today, I integrated the prose analyzer sequence with the rest of my analyzer.
I faced a lot of issues with sequencing which I spent time changing and rechecking. I also had to add an additional pass for those lines with words at the beginning of a line(even if capitalized), which generally occur in between a sentence, to be gathered together with the rest of the above lines as prose.
The outcome of this week was the gathering of prose, bullets, and closing the bracket sequences.

No comments:

Post a Comment

  With today's increasing digitalization, it has been observed that the vast majority of processes are becoming automated. One such occu...