Dealing with plurals

This explains (or tries to explain) how I deal with finding plural words in my various lexicons. I wanted to do this so my polygon solver can provide options to exclude plural words, or in some cases, plural words that end in 's'.

My data sources are just lists of words - I have no idea which words in these lists are plurals. So I needed to find a way to automate the process of deciding whether (or not) a word is a plural. My list of 224000 words was way too long to go through by hand.

Then I found this (rather amazing) python module inflect.py. This module has a singular_noun() method which indicates if a potentially plural noun has a singular version. For example, singular_noun("brethren") gives me "brother". Which is great. However, pure inflection does not always work: e.g. singular_noun("asparagus") gives me "asparagu", which is not a word (at least not one that I'm familiar with).

To overcome this, I additionally check that the result from singular_noun(potential_plural) is also in my dictionary. If it is, then I add potential_plural to my list of plurals.

This all seemed to work out pretty well. But then I noticed that I ended up with a few words ending with 'ss', none of which could possibly be a plural. So I exclude all of these from my list of plurals. In addition, I also exclude all words whose singular_noun() returns the same word as itself. I have hardly had to modify the list thus obtained (61663 plural words), apart from (so far) removing the word 'cosmos'. If you do find any other anomalies then *please* let me know!

All in all, I think this algorithm for detecting plural words is highly (>99%) accurate, but not quite foolproof.

I hope all this makes at least some sort of sense!


Comments to
Andy's anagram solver
Andy's word finder
Andy's polygon solver - NEW! (no really)
Andy's home page