Yazhini

Data Scientist @Tekclan

Before scrolling in to the concept we all have in mind that both Stemming and Lemmatization generate the root form of the word.Let us see how it differs

Stemming

Stemming works by cutting off the end or the beginning of the word, basically common prefixes and suffixes that can be found in the word.

Example:

    studies   --> studi (es is suffix)
    studying  --> study (ing is suffix)

By doing stemming for the above two words studies and studying we got the stem as studi and study.It just removed the suffix of the word.

Lemmatization

Lemmatization, on the other hand do morphological analysis of the words which means it structures the given word and generates the lemma. To do so, it is necessary to have detailed dictionaries which the lemmatization algorithm can look through.

Example:

    studies  --> study (morphological information - Present tense of word)
    studying --> study (morphological information - Gerund of the word)

Thus lemmatization generates the base of the word but stemming isn’t.

Advantage and Disadvantage

Building a stemmer is simpler than lemmatizer and it needs good lingustic dictionaries to perform lemmatization.But lemmatization reduces noise and gives accurate base form of the word.

The actual meaning of the word may differ by doing stemming or lemmatization. For instance the word unrely will be changed to rely, here the actual negative sense of the word is totally changed to positive. So these NLP processing methods are used for mainly e-commerce based searches.

Code

  # import these modules
       from nltk.stem import WordNetLemmatizer
       from nltk.stem import PorterStemmer

       stemming = PorterStemmer()
       lemmatizer = WordNetLemmatizer().lemmatize

       words = ["studies", "losses", "knives"]
       for word in words:
  # Stemming
        print(word, " : ", stemming.stem(word))
  #Lemmatization
        print(word," :", lemmatizer(str(word)))

Output

    stemming  studies  :  studi
    lemmatize  studies :  study
    stemming  losses   :  loss
    lemmatize  losses  :  loss
    stemming  knives   :  knive
    lemmatize  knives  :  knife

Lemmatization and Stemming

Yazhini

Stemming

Lemmatization

Advantage and Disadvantage

Code

Output

Leave a comment Cancel reply

Recent Posts

Recent Comments

Categories

Twitter Feeds

Twitter Feeds