WEEK 8

         

DAY 1:


Today I started with the date range analyzer i.e. to find the amount of duration that has passed between 2 dates in whichever format the dates may be given. Here are some example date ranges:




For starting with this, I made an assumption for simplicity:
When a person mentions a date range, he will mostly use the same format for both the dates mentioned in the range.
This can be seen from the example above 
I have found 7 primary types of date formats which are the most common ones.
Therefore for finding what "type" of the date it is, I have assigned a different value for each type. Based on the date type, I wrote specific functions for each type of date to find out the day,month, and year from both the dates and then I created another function that takes the y,m,d of date1 and date2 and calculates the duration.
The logic I used was something that I came up with and I felt the best way, is finding out the number of days between the 2  dates.
This was the final logic after lots of trials and errors.
By doing this for different types, different functions were created and by creating the debug text files, I observed the date range generated in days. This worked perfectly for different date types.
I faced a problem after this 
  • How to incorporate "present"
  • How to put the information of day month day onto to the Kb?(as this was being done for dates belonging to date range, not generic standalone dates)

DAY 2:


I started today by addressing the problem that I faced yesterday, and that is the incorporation of "present" and putting information onto the knowledge base.

I started with the second element. For this, I went to each and every specific rule and found out how the information of day, month, and year was being handled and accordingly put the normalized information as an attribute on the corresponding node on the tree.
This was done for all the date rules.

Example:


Next, for the date range, I called the date range calculator on both the dates and put the date range in the format :    

                                    "no_of_days":"no_of_months":"no_of_years".

while doing this, I realized that simply the normalized date can be sent to the date range calculator which makes my previous code redundant. Therefore I replaced the entire code. This method also made it easy to incorporate "present".

Here is how I called the function:

 

DAY 3,4:


I spent time only putting information onto the KB. As guided by my mentor, I elaborated on the information found in telephones, dates and grades.
First rule of NLP++ is that never miss out the exactly textual information. For example,
I was normalizing grades as I felt it looked better on the Knowledge Base.
But by doing this there was loss of real information:
E.g: CGPA 10: 88%
SGPA 9.4: 84%
But only the percentage information is displayed on the KB due to which we wouldn't know what was the real type of grade it is.
So I changed it to look like this:


Next, I also worked on the telephone number types:



I also was able to put the information on the dates and date ranges as well. This was relatively easier as the values were already attributes on the _date nodes.
This was the function doing the same:


DAY 5:


I worked on further refining the knowledge base. I had made an initial assumption for dates having missing values, where day =1 and month =1 for unknown values. I placed this information on the KB, but it is in a way violation of the text mentioned by the person. In a case like this, it may look like every date started on the 1st of the month.
Therefore I removed this and only put the information mentioned and then I put the normalized date as well.

I also modified the phone numbers to show if we know the type and also the other components of the telephone number - the area, the prefix, and the station.

No comments:

Post a Comment

  With today's increasing digitalization, it has been observed that the vast majority of processes are becoming automated. One such occu...