-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support inflection of nouns #19
Comments
+1 I think this was a main use case when we started discussing this project as most of the placeholders in messages are nouns (proper or common). The solution will probably range from simple/complex algorithms + lexicon exceptions, to potentially ML models for some languages. I feel this is the first problem we should tackle, as it intersects well with common needs. |
I say that for single words or very few words, ML is undesirable. From experience, it’s very resource intensive, which makes it undesirable for resource constrained environments. There are many languages where a traditional algorithmic solution for out of vocabulary words is cheaper, faster, smaller, quicker to implement and more accurate than an ML solution. I have some horror stories around this topic. If you start handling many words or a whole sentence, ML starts looking more appealing because such solutions thrive on context. I’d say the only exception to this rule are agglutinative languages, like Finnish and perhaps Turkish. A generally ML approach is more likely accurate in such languages. That requires a lengthier overview and education session on the topic. The ML versus rule based approach will probably involve a discussion to find the right balance. |
I expect most languages will be fine with the algorithmic + lexicon approach (and we should focus on those first). I would use ML only when necessary, as you mentioned in Finnish/Turkish. So this is not a decision we need to make a head of time, just a reminder that we need to organize our code to allow different implementations. |
…CENSE.txt for copyright and permission details. This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
…CENSE.txt for copyright and permission details. This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19 This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25 This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
I'd like to nominate this to be resolved with pull request #35. |
We should be able to inflect common nouns and proper nouns. This would typically include being able to modify the grammatical gender, grammatical number and grammatical cases in a lot of languages.
Prepositions in English take on grammatical case in many other languages. Typically in the form of suffixes to nouns. So this makes it related to issue #17.
English possessive/genitive forms of nouns typically need to add
's
or just'
, but that algorithmic logic is a lot harder in other languages, like German, Danish, Dutch, Russian and so forth.For example, you should be able to turn "city" into "city's" or turn "cities" into "cities'". For a language like Russian, you can look at кот for an example.
Here's a more compact declension table for looking at such information.
The text was updated successfully, but these errors were encountered: