Harper – an open-source alternative to Grammarly

Comments

By oersted 2025-06-2113:022 reply

Fantastic work, I was so fed up with Grammarly and instantly installed this.

I'm just a bit skeptical about this quote:

> Harper takes advantage of decades of natural language research to analyze exactly how your words come together.

But it's just a rather small collection of hard-coded rules:

https://docs.rs/harper-core/latest/harper_core/linting/trait...

Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?

There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.

LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.

Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.

By chilipepperhott 2025-06-2116:371 reply

I'll admit it's something of a bold label, but there is truth in it.

Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.

> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.

This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.

I appreciate your skepticism and attention to detail.

By s1291 2025-06-2217:06

Here's an article you might find interesting: https://www.quantamagazine.org/when-chatgpt-broke-an-entire-...

By tough 2025-06-2116:311 reply

to someone who would like to study/learn that evolution, any good recs?

By jasonjmcghee 2025-06-2215:201 reply

This skips over Bag of Words / N-Gram / TF-IDF and many other things, but paints a reasonable picture of the progression.

1. https://jalammar.github.io/illustrated-word2vec/

2. https://jalammar.github.io/visualizing-neural-machine-transl...

3. https://jalammar.github.io/illustrated-transformer/

4. https://jalammar.github.io/illustrated-bert/

5. https://jalammar.github.io/illustrated-gpt2/

And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.

---

There's also state space models, but don't believe they've gone mainstream yet.

https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

And diffusion models - but I'm struggling to find a good resource so https://ml-gsai.github.io/LLaDA-demo/

---

All this being said- many tasks are solved very well using a linear model and tfidf. And are actually interpretable.

By oersted 2025-06-2218:14

This is indeed the previous generation, but it's not even that old. When I was coming out of undergrad word2vec was the brand-new thing that was eating-up the whole field.

Indeed, before that there was a lot of work on applying classical ML classifiers (Naive Bayes, Decision Trees, SVM, Logistic Regression...) and clustering algorithms (fancily referred to as unsupervised ML) to bag-of-words vectors. This was a big field, with some overlap with Information Retrieval, lending to fancier weightings and normalizations of bag-of-words vectors (TF-IDF, BM25). There was also the whole field of Topic Modeling.

Before that there was a ton of statistical NLP modeling (Markov chains and such), primarily focused around machine translation before neural-networks got good enough (like the early version of Google Translate).

And before that there were a few decades of research on grammars (starting with Chomsky), with a lot of overlap with compilers, theoretical CS (state-machines and such) and symbolic AI (lisps, logic programming, expert systems...).

I myself don't have a very clear picture of all of this. I learned some in undergrad and read a few ancient NLP books (60s - 90s) out of curiosity. I started around the time where NLP, and AI in general, had been rather stagnant for a decade or two, it was rather boring and niche, believe it or not, but was starting to be revitalized by the new wave of ML and then word2vec with DNNs.

By tolerance 2025-06-213:522 reply

I would much rather check my writing against grammatical rules that are hard coded in an open source program—meaning that I can change them—than ones that I imagine would be subject to prompt fiddling or worse; implicitly hard coded in a tangle of training data that the LLM would draw from.

The Neovim configuration for the LSP looks neat: https://writewithharper.com/docs/integrations/neovim

The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.

By triknomeister 2025-06-217:243 reply

You would lose out on evolution of language.

By phoe-krk 2025-06-218:442 reply

Natural languages evolve so slowly that writing and editing rules for them is easily achievable even this way. Think years versus minutes.

By fakedang 2025-06-2110:104 reply

Aight you win fam, I was trippin fr. You're absolutely bussin, no cap. Harvard should be taking notes.

(^^ alien language that was developed in less than a decade)

By notahacker 2025-06-2112:221 reply

The existence of common slang which isn't used in the sort of formal writing that grammar linting tools are typically designed to promote is more of a weakness of learning grammar by a weighted model of the internet vs formal grammatical rules than a strength.

Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...

By chrisweekly 2025-06-2112:542 reply

Huh, this is the first time I've seen "noughties" used to describe the first decade of the 2000s. Slightly amusing that it's surely pronounced like "naughties". I wonder if it'll catch on and spread.

By harvey9 2025-06-2113:21

The fact that you never saw it before suggests it did not catch on and spread during the last 25 years.

By nailer 2025-06-2113:581 reply

‘Noughties’ was popular in Australia from 2010 onwards. Radio stations would “play the best from the eighties nineties noughties and today”.

By notahacker 2025-06-2114:15

Common in Britain too, also appears in the opening lines of the Wikipedia description for the decade and the OED.

By afeuerstein 2025-06-2110:161 reply

I don't think anyone has the need to check such a message for grammar or spelling mistakes. Even then, I would not rely on a LLM to accurately track this "evolution of language".

By fakedang 2025-06-2112:002 reply

What if you're writing emails to GenZers?

By dpassens 2025-06-2112:32

As a zoomer, I'd rather not receive emails that sound like they're written by a moron.

By bombcar 2025-06-2112:36

Attempting to write like a GenZ when you’re not gets you “hello fellow kids” and “Boomer” right away.

By phoe-krk 2025-06-2110:142 reply

Yes, precisely. This "less than a decade" is magnitudes above the hours or days that it would take to manually add those words and idioms to proper dictionaries and/or write new grammar rules to accomodate aspects like skipping "g" in continuous verbs to get "bussin" or "bussin'" instead of "bussing". Thank you for illustrating my point.

Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.

By fakedang 2025-06-2111:591 reply

Not exactly. It takes time for those words to become mainstream for a generation. While you'd have to manually add those words in dictionaries, LLMs can learn these words on the fly, based on frequency of usage.

By phoe-krk 2025-06-2112:45

At this point we're already using different definitions of grammar and vocabulary - are they discrete (as in a rule system, vide Harper) or continuous (as in a probability, vide LLMs). LLMs, like humans, can learn them on the fly, and, like humans, they'll have problems and disagreements judging whether something should be highlighted as an error or not.

Or, in other words: if you "just" want a utility that can learn speech on the fly, you don't need a rigid grammar checker, just a good enough approximator. If you want to check if a document contains errors, you need to define what an error is, and then if you want to define it in a strict manner, at that point you need a rule engine of some sort instead of something probabilistic.

By efitz 2025-06-2116:581 reply

I’m glad we have people at HN who could have eliminated decades of effort by tens of thousands of people, had they only been consulted first on the problem.

By phoe-krk 2025-06-2117:37

Which effort? Learning a language is something that can't be eliminated. Everyone needs to do it on their own. Writing grammar checking software, though, can be done few times and then copied.

By dmoy 2025-06-2119:09

Pedantically,

aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).

By qwery 2025-06-219:311 reply

Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly". You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?

By phoe-krk 2025-06-219:541 reply

> Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".

Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".

It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.

> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?

Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.

By qwery 2025-06-2111:302 reply

Fair enough, thanks for replying. I don't see the task of specifying a grammar as straightforward as you do, perhaps. I guess I just didn't understand the chain of comments.

I find that clear-cut, rigid rules tend to be the least helpful ones in writing. Obviously this class of rule is also easy/easier to represent in software, so it also tends to be the source of false positives and frustration that lead me to disable such features altogether.

By phoe-krk 2025-06-2111:362 reply

When you do writing as a form of art, rules are meant to be bent or broken; it's useful to have the ability to explicitly write new ones and make new forms of the language legal, rather than wrestle with hallucinating LLMs.

When writing for utility and communication, though, English grammar is simple and standard enough. Browsing Harper sources, https://github.com/Automattic/harper/blob/0c04291bfec25d0e93... seems to have a lot of the basics already nailed down. Natural language grammar can often be represented as "what is allowed to, should, or should not, appear where, when, and in which context" - IIUC, Harper seems to tackle the problem the same way.

By qwery 2025-06-223:56

I'm certainly not disputing the existence of grammar nor do I think an LLM is a good way to implement/check/enforce one. And now I realise how my first comment landed. Thanks again!

By tolerance 2025-06-231:38

You get it!!

By bombcar 2025-06-2112:29

Just because the rules aren’t set fully in stone, or can be bent or broken, doesn’t mean they don’t “exist” - perhaps not the way mathematical truths exist, but there’s something there.

Even these few posts follow innumerable “rules” which make it easier to (try) to communicate.

Perhaps what you’re angling against is where rules of language get set it stone and fossilized until the “Official” language is so diverged from the “vulgar tongue” that it’s incomprehensibly different.

Like church/legal Latin compared to Italian, perhaps. (Fun fact - the Vulgate translation of the Bible was INTO the vulgar tongue at the time: Latin).

By airstrike 2025-06-2113:03

I don't need grammar to evolve in real time. In fact, having a stabilizing function is probably preferable to the alternative.

By eadmund 2025-06-2114:402 reply

If a language changes, there are only three possible options: either it becomes more expressive; or it becomes less expressive; or it remains as expressive as before.

Certainly we would never want our language to be less expressive. There’s no point to that.

And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.

Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.

So, what would the point of evolution be? If technology impedes it … fine.

By canjobear 2025-06-2117:37

The world that we need to be expressive about is changing.

By dragonwriter 2025-06-2114:43

> So, what would the point of evolution be?

Being equally as expressive overall but being more focussed where current needs are.

OTOH, I don't think anything is going to stop language from evolving in that way.

By Polarity 2025-06-219:382 reply

why did you use chatgpt for this text then?

By acidburnNSA 2025-06-2110:071 reply

I can write em-dashes on my keyboard in one second using the compose key: right alt + ---

By Freak_NL 2025-06-2112:371 reply

Same here — the compose key is so convenient you forget most people never heard of it. This em-dashes mean LLM output thing is getting annoying though.

By johnisgood 2025-06-2115:23

> This em-dashes mean LLM output thing is getting annoying though.

Agreed. Same with those non-ASCII single and double quotes.

By shortformblog 2025-06-213:582 reply

LanguageTool (a Grammarly competitor) is also open source and can be managed locally:

https://github.com/languagetool-org/languagetool

I generally run it in a Docker container on my local machine:

https://hub.docker.com/r/erikvl87/languagetool

I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.

It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.

By akazantsev 2025-06-217:491 reply

There are two versions of the LanguageTool: open source and cloud-based. Open source checks the individual words in the dictionary just like the system's spell checker. Maybe there is something more to it, but in my tests, it did not fix even obvious errors. It's not an alternative to Grammarly or this tool.

By shortformblog 2025-06-2111:54

There is. It can be heavily customized to your needs and built to leverage a large ngram data set:

https://dev.languagetool.org/finding-errors-using-n-gram-dat...

I would suggest diving into it more because it seems like you missed how customizable it is.

By unfitted2545 2025-06-2122:08

This is a really nice app to use LanguageTool, it runs the server in the flatpak: https://flathub.org/apps/re.sonny.Eloquent