The Inflection API
The Inflection
class converts words from their base form to a user-specified. inflection type. The class aggregates dictionary based lookup and rule based inflections, including the nerual-network models used to select the appropriate rules. It is implemented as a singleton that is instantiated for the first time when you call any of its methods from lemminflect
.
Only the base form of a word can be inflected and the library methods here expect the incoming word to be a lemma. If your word is not in its base form, first call the lemmatizer to get the base form. When using the spaCy extension, lemmatization is handled internally.
Examples
Usage as a library
> from lemminflect import getInflection, getAllInflections, getAllInflectionsOOV
> getInflection('watch', tag='VBD')
('watched',)
> getAllInflections('watch')
{'NN': ('watch',), 'NNS': ('watches', 'watch'), 'VB': ('watch',), 'VBD': ('watched',), 'VBG': ('watching',), 'VBZ': ('watches',), 'VBP': ('watch',)}
> getAllInflections('watch', upos='VERB')
{'VB': ('watch',), 'VBP': ('watch',), 'VBD': ('watched',), 'VBG': ('watching',), 'VBZ': ('watches',)}
> getAllInflectionsOOV('xxwatch', upos='NOUN')
{'NN': ('xxwatch',), 'NNS': ('xxwatches',)}
Usage as a extension to spaCy
> import spacy
> import lemminflect
> nlp = spacy.load('en_core_web_sm')
> doc = nlp('I am testing this example.')
> doc[4]._.inflect('NNS')
examples
Methods
getInflection
getInflection(lemma, tag, inflect_oov=True)
The method returns the inflection for the given lemma based on te PennTreebank tag. It first calls getAllInflections
and if none were found, calls getAllInflectionsOOV
. The flag allows the user to disable the rules based inflections. The return from the method is a tuple of different spellings for the inflection.
Arguments
- lemma: the word to inflect
- tag: the Penn-Treebank tag
- inflect_oov: if
False
the rules sytem will not be used.
getAllInflections
getAllInflections(lemma, upos=None)
This method does a dictionary lookup of the word and returns all lemmas. Optionally, the upos
tag may be used to limit the returned values to a specific part-of-speech. The return value is a dictionary where the key is the Penn Treebank tag and the value is a tuple of spellings for the inflection.
Arguments
- lemma: the word to inflect
- upos: Universal Dependencies part of speech tag the returned values are limited to
getAllInflectionsOOV
getAllInflectionsOOV(lemma, upos)
Similary to getAllInflections
, but uses the rules system to inflect words.
Arguments
- lemma: the word to inflect
- upos: Universal Dependencies part of speech tag the returned values are limited to
Spacy Extension
Token._.inflect(tag, form_num=0, inflect_oov=True, on_empty_ret_word=True)
The extension is setup in spaCy automatically when lemminflect
is imported. The above function defines the method added to Token
. Internally spaCy passes token information to a method in Inflections
which first lemmatizes the word. It then calls getInflection
and then returns the specified form number (ie.. the first spelling).
Arguments
- form_num: When multiple spellings exist, this determines which is returned
- inflect_oov: When
False
, only the dictionary will be used, not the OOV/rules system - on_empty_ret_word: If no result is found, return the original word
setUseInternalLemmatizer
setUseInternalLemmatizer(TF=True)
To inflect a word, it must first be lemmatized. To do this the spaCy extension calls the lemmatizer. Either the internal lemmatizer or spaCy's can be used. This function only impacts the behavior of the extension. No lemmatization is performed in the library methods.
Arguments
- TF: If
True
, use the LemmInflect lemmatizer, otherwise use spaCy's