The 100KB Search Index: Lunr.js Optimization

One of the first things people ask when they start using Jottings is: "How does search work on my site if there's no backend database?"

The answer is Lunr.js—a tiny, powerful search library that runs entirely in the browser. But here's the thing: a naive implementation can bloat your search index to several megabytes. At Jottings, we've optimized our Lunr.js implementation to keep search indexes under 100KB, even for sites with hundreds of posts.

Let me walk you through how we got there.

Why Not Just Use the Server?

The obvious question first: why not add a backend search API? Because one of Jottings' core values is simplicity. Your site is a collection of static HTML files served from a CDN. No database. No API. No monthly hosting bills.

Client-side search fits perfectly into this philosophy. When someone visits your site and searches, the browser downloads a small index and searches it locally. It's fast, it's private (search queries don't leave the user's browser), and it costs nothing to operate.

The tradeoff? The index needs to be compact. Nobody wants to download a 5MB JavaScript file just to search your blog.

Lunr.js: Tiny but Mighty

Lunr.js is a browser-based search library inspired by Elasticsearch. It's only about 30KB minified and gzipped, but it's capable of sophisticated full-text search: stemming, field boosting, fuzzy matching, and more.

Here's how it works in practice:

At build time, we create a Lunr index from your site's content
We serialize that index as JSON and include it in your site bundle
On the client, Lunr loads the index and handles search queries instantly

For a site with 500 posts, you're typically looking at 40-80KB for the index itself (compressed), plus another 30KB for Lunr. Still less than a high-quality hero image.

The Optimization Techniques

Building the index is straightforward. Keeping it small is an art.

Strip Markdown (and HTML)

Our first optimization: we don't index the raw post content. We strip all markdown syntax and HTML tags first.

Why? Because indexing **bold text** creates unnecessary tokens. Lunr needs to index the actual words: bold and text. This alone reduced our index sizes by 30-40%.

// Before: "**Why I love Lunr.js** - A short post"
// After: "Why I love Lunr.js A short post"

Field Weighting, Not Duplication

Lunr supports field boosting—you can tell it to weight words in titles more heavily than words in body text. We use this instead of duplicating content.

const index = lunr(function () {
  this.field('title', { boost: 10 })
  this.field('excerpt', { boost: 5 })
  this.field('tags', { boost: 3 })
  this.field('body')

  docs.forEach(doc => this.add(doc))
})

This way, a search for "optimization" matches the title with high relevance, but you're not storing the word multiple times.

Aggressive Stemming and Stopwords

Lunr includes English stemming by default. This reduces inflected forms to their root: "searching", "searched", "searches" all become "search". Brilliant for index size.

We also configure Lunr to exclude common stopwords (the, a, and, etc.). These don't improve search quality but add unnecessary bulk.

// Lunr automatically handles this, but here's what it does:
// Input: "the best practices for searching content"
// Indexed tokens: "best", "practic", "search", "content"

Exclude Metadata You Don't Need

At build time, we consciously decide what to index. Do you really need the author's bio in search? Do you need publishing metadata? We don't include it.

// We index:
const doc = {
  title: post.title,
  excerpt: post.excerpt, // first ~150 chars
  tags: post.tags.join(' '),
  body: post.body
}

// We don't index:
// - author bio
// - creation timestamps
// - revision history
// - internal metadata

This is a design choice that saves 20-30% in index size while actually improving search quality (less noise).

Store Only What You Need

The Lunr index contains the searchable data, but it also stores metadata for results. We're ruthless here.

// For each result, we store only:
{
  id: 'post-id',
  url: '/posts/my-post',
  title: 'Post Title',
  excerpt: 'First 150 characters...'
}

// We don't store:
// - Full body text
// - Image URLs
// - Dates (these are already in the page)

When someone clicks a search result, they navigate to the actual post page. That page has the full content, images, and everything else. The index is just for discovery.

The Numbers

Here's what this looks like in practice:

Lunr.js library: 30KB (minified + gzipped)
Search index (300 posts): 45KB (compressed)
Total bundle: 75KB

Compare that to asking users to download your full content to search it locally, or forcing them to rely on an external search service. This is genuinely tiny.

For sites with 500-1000 posts, we're still under 100KB. At around 2000 posts, you hit that ceiling—at which point you might consider pagination or filtering to keep the index snappy.

The User Experience

The result is search that feels instant. There's no network request, no server latency, no wait time. Users type, they see results. That's powerful.

It's also private—the search query never leaves their device. No tracking, no analytics pollution. Just a person searching your content.

When This Breaks Down

To be honest: client-side search isn't for everything. If you have 10,000+ posts or need real-time full-text search with typo correction, you'll probably want a service like Algolia or Meilisearch.

But for most blogs and smaller publications? Client-side search is a gift. It's fast, it's cheap, and it's private. Lunr.js makes it possible at a scale that fits the serverless philosophy.

If You're Building This Yourself

If you're implementing Lunr.js for your own site, here's the takeaway: index ruthlessly. Ask yourself before including each field: "Does this improve search quality?" If the answer is no, leave it out.

The best index isn't the most complete one. It's the smallest one that still finds what people are looking for.

At Jottings, we've made this a default. When you generate a site, search just works—optimized, fast, and tiny. You don't have to think about index size or server costs or search infrastructure. We've already figured that out for you.

That's the goal: giving writers the tools they need, without the overhead.