compromise
demos
github
yah, logo
language is cѳmplicated.
and there's a gazillion words.
compromiseis ajavascript library
that interprets andpre-parsestext.
so things are way easier.
it'seasy to forget,that
the top 1,000 words
are 80% of english
– easy –
unambiguous, common speech
20%
( harder stuff )
people get obsessed with that end bit →
with clever computer science, a good+15%accuracy is possible -
by usinggigabytemodels,linear-algebra,orlisp!(no offence!)
but, really.
a10%-12% bumpin accuracy is realistic.
usingconfig, plugins, and customization.
... compromise!
compromise works by compressing a large list of words,
and then expanding them at runtime.
how big is the word-list?
# of words% coverage
110%
10050%
30065%
60070%
1,00080%
5,00090%
10,00098%
14,00099.99%
(big enough!)
the final size of the lexicon is 40kb.
this gif is 133kb
compromise is fast-enough to run on keypress:
don't take my word for it,
try it on every State of the Union Address:
loaded 0kb of text
(should take ~10s)

tags:

#Noun - 'friendship'
#Singular - 'owl'
#Person - 'Jasper Beardly'
#FirstName - 'troy'
#MaleName - 'john'
#FemaleName - 'jane'
#LastName - 'mcclure'
#Place - 'Toronto, Ontario'
#Country - 'France'
#City - 'springfield'
#Region - 'Florida'
#Address - '742 evergreen terrace'
#Organization - 'St Michael's Church'
#SportsTeam - 'Red Sox'
#Company - 'Globex Corp'
#School - 'Hillcrest highschool'
#ProperNoun - 'France'
#Honorific - 'Dr.'
#Plural - 'owls'
#Uncountable - 'air'
#Pronoun - 'he'
#Actor - 'bowler'
#Activity - 'swimming'
#Unit - 'kilos'
#Demonym - 'canadians'
#Possessive - 'Spencer's'
#Verb - 'would have walked'
#PresentTense - 'walks'
#Infinitive - 'walk'
#Gerund - 'walking'
#PastTense - 'walked'
#Copula - 'is'
#Modal - 'would'
#PerfectTense - 'have walked'
#FuturePerfect - 'will have walked'
#Pluperfect - 'had walked'
#PhrasalVerb - 'walk off'
#Particle - 'off' of 'walk off'
#Value
#Cardinal - '5'
#Ordinal - '5th'
#RomanNumeral - 'XLII'
#TextValue - 'five'
#NumericValue - '5'
#Percent - '4.3%'
#Money - '$5.20'
#Date - 'june 2nd'
#Month - 'june'
#WeekDay - 'wednesday'
#Adjective - 'fast'
#Comparable - 'fast'
#Comparative - 'faster'
#Superlative - 'fastest'
#Contraction - 'he's'
#Adverb - 'quickly'
#Currency - 'USD'
#Determiner - 'the'
#Conjunction - 'and'
#Preposition - 'of'
#QuestionWord - 'who'
#Pronoun - 'he'
#Expression - 'Gee'
#Url - 'http://compromise.cool'
#HashTag - '#nlp'
#PhoneNumber - '(800) 555-0000'
#AtMention - '@nlp_compromise'
#Emoji - ':)'
#Emoticon - '😊'
#Email - 'hi@compromise.cool'
#Auxiliary - 'would have had'
#Negative - 'not'
#Abbreviation - 'st.'
#Acronym - 'CIA'

Docs: