Before scrolling in to the concept we all have in mind that both Stemming and Lemmatization generate the root form of the word.Let us see how it differs
Stemming
Stemming works by cutting off the end or the beginning of the word, basically common prefixes and suffixes that can be found in the word.
Example:
studies --> studi (es is suffix) studying --> study (ing is suffix)
By doing stemming for the above two words studies and studying we got the stem as studi and study.It just removed the suffix of the word.
Lemmatization
Lemmatization, on the other hand do morphological analysis of the words which means it structures the given word and generates the lemma. To do so, it is necessary to have detailed dictionaries which the lemmatization algorithm can look through.
Example:
studies --> study (morphological information - Present tense of word) studying --> study (morphological information - Gerund of the word)
Thus lemmatization generates the base of the word but stemming isn’t.
Advantage and Disadvantage
Building a stemmer is simpler than lemmatizer and it needs good lingustic dictionaries to perform lemmatization.But lemmatization reduces noise and gives accurate base form of the word.
The actual meaning of the word may differ by doing stemming or lemmatization. For instance the word unrely will be changed to rely, here the actual negative sense of the word is totally changed to positive. So these NLP processing methods are used for mainly e-commerce based searches.
Code
# import these modules from nltk.stem import WordNetLemmatizer from nltk.stem import PorterStemmer stemming = PorterStemmer() lemmatizer = WordNetLemmatizer().lemmatize words = ["studies", "losses", "knives"] for word in words: # Stemming print(word, " : ", stemming.stem(word)) #Lemmatization print(word," :", lemmatizer(str(word)))
Output
stemming studies : studi lemmatize studies : study stemming losses : loss lemmatize losses : loss stemming knives : knive lemmatize knives : knife