Yazhini

Data Scientist @Tekclan

This NLP processing is converting sentence to list of words which is tagged by its parts of speech.The parts of speech includes

  1. Noun
  2. Verb
  3. Adjective
  4. Adverb
  5. Pronoun
  6. Preposition
  7. Conjunction
  8. Interjection
  9. Article or Determiner

Parts of speech tagging is given to a word to know how the word actually make sense to the sentence.Lets see how parts of speech tagging helps in knowing the sense of the word.

Scenarios POS tagging used

1. Word sense disambiguation:
Eg: She saw a bear , Your efforts will bear a fruit
Here the word bear used in these sentence is a homonym.

2. Named entity recognition:
Eg: Jawaharlal Nehru was the first prime minister of India, Jawaharlal Nehru stadium is more crowded today
Here Jawaharlal Nehru defines both person and stadium.

3. Co-reference resolution:
Eg: Bill said he would come
Here Bill and he is the same person

These are some scenario where parts of speech plays major roles in identifying the sense of the word to the sentence.

POS Tagger

It is a piece of software used to assign POS to the sentence. We will see few tagging how the taggers do

  • CC - coordinating conjunction
  • DT - determiner
  • EX - existential there (like: “there is”, “there exists”)
  • FW - foreign word
  • IN - preposition
  • JJ - adjective ‘big’
  • JJR - adjective, comparative ‘bigger’
  • JJS - adjective, superlative ‘biggest’
  • LS - list marker
  • MD - modal could, will
  • NN - noun, singular ‘desk’
  • RB - adverb very, silently,
  • VBZ - verb, 3rd person sing. present takes

Types of tagging

1. Lexical tagging:

This will assign tags based on most occurring of POS tagging in corpus training.

2. Rule based tagging:

This will assign tags based on rules.
Example: The big mountain has more trees
In case the target word is has pre-word as determinor and post-word as noun then the middle word should be an adjective.In this example ‘big’ is our target word and using rule based tagging it is tagged as adjective.

3. Stochastic tagging

This is based on probability that a word occur with particular tag.
For example:
Our training file has,
Mary(N) Jane(N) can(M) see(V) will(N)
Spot(N) will(M) see(V) Mary(N)
Will(M) Jane(N) spot(V) Mary(N)
Mary(N) will(M) pat(V) spot(N)
We have Noun,Verb and Modal verb.
Now stochastic tagging will calculate probability value for each words with its pos tagging.

Likewise the probability value is calculated and maximum value occurred is tagged.

4. Hidden Markov model tagging:

This assigns POS tags based on sequence of the tags occurred in training file. This also combines with stochastic tagging to produce accurate result. Naive bayes network is the base for this Hidden Markov model tagging.

Code

  # import these modules
    from nltk import pos_tag
    from nltk.tokenize import word_tokenize

    sent = "The boys read books very fast"
    words = word_tokenize(sent)         #tokenize first
    print("POS is ", pos_tag(words))    #output
 

Output

   ['The', 'boys', 'read', 'books', 'very', 'fast']
   POS is  [('The', 'DT'), ('boys', 'NNS'), ('read', 'VBP'), ('books', 'NNS'), ('very', 'RB'), ('fast', 'RB')]