Question d’entretien chez Celebal Technologies

stemming, lemmatization and tokenization

Réponse à la question d'entretien

Utilisateur anonyme

14 sept. 2022

Tokenization - It is the process of breaking down the given text into the smallest unit in a sentence called a token. Punctuation marks, words, and numbers can be considered tokens. Stemming- the process of finding the root of words. Lemmatization- The process of finding the form of the related word in the dictionary. It is different from Stemming. It involves longer processes to calculate than Stemming.