Why We Split Content from HTML Generation

One of the more interesting architectural decisions I made recently for Jottings was moving all HTML generation from the API layer into the build system. It sounds simple on the surface, but it's one of those changes that rippled through the entire codebase and made me rethink how content flows through the system.

Let me walk you through the problem we were solving and why this approach makes so much sense.

The Problem: HTML Generated in Three Places

When I first built Jottings, I didn't think too carefully about where HTML should live. Content came in as markdown, and I needed to convert it somewhere, so... I converted it at multiple points:

The API layer generated HTML when users created or updated a jot, storing it in DynamoDB alongside the raw markdown
The build system regenerated the same HTML when building the static site
The feed generators also created HTML, sometimes using the stored version, sometimes regenerating fresh

This is a classic symptom of distributed responsibility. Each layer had slightly different needs (feeds wanted semantic HTML, the site wanted styled HTML with custom markdown processing), so each layer solved the problem independently. The result? Inconsistency and redundancy.

If I updated the markdown parser in the build system, users viewing feeds might see different formatting. If someone had an older jot in the database with a subtly different HTML rendering, the builds would use different logic. It was maintainability nightmare waiting to happen.

The Real Issue: Storage and Compute Tradeoff

But the inconsistency wasn't even the core problem—it was the storage bloat.

Storing rendered HTML in DynamoDB means you're keeping both the raw markdown and the rendered HTML for every single jot. For a user with 500 posts, that's effectively doubling the data you're storing. On a serverless platform where you pay for every byte, that adds up.

More importantly, HTML is a presentation artifact. It's not content. The real content is the markdown. Why was I storing presentation in my database? That feels backwards.

The Solution: Single Source of Truth

The solution I landed on is architecturally cleaner: the API stores only raw markdown, and the build system—which runs asynchronously—handles all HTML generation.

Here's how it works now:

When a user creates a jot:

The API accepts the raw markdown
Stores it as-is in DynamoDB (no HTML generation)
Triggers a build via SQS

During the static site build:

The build system reads all the raw markdown
Processes it with the custom markdown parser
Generates all HTML pages fresh

For feeds:

RSS uses plain text (no HTML tags)
JSON feeds use semantic HTML via the build system

This approach has a few immediate benefits:

Consistency: There's now only one place where markdown becomes HTML. The build system is the single source of truth. If I fix the markdown parser, every page—websites, feeds, everything—benefits from that fix.

Smaller storage footprint: We're no longer duplicating content. A 500-post site uses less DynamoDB storage.

Separation of concerns: The API's job is to store and serve data. The build system's job is to generate presentations. Each layer has a single responsibility.

Future-Proofing with Feature Gates

Here's where it gets interesting for the roadmap.

Since the build system now controls all rendering, I can introduce the renderMarkdown flag. In the future, non-Pro users might get plain text rendering (escaping HTML, preserving line breaks, but no fancy markdown processing), while Pro users get the full custom markdown parser treatment.

This isn't possible if HTML is being generated in the API layer. You'd have to regenerate it later, or store multiple versions, or make complex decisions at read time. By centralizing rendering to build time, feature-gating becomes straightforward: just pass a flag when queuing the build.

The Tradeoffs

There's one tradeoff worth mentioning: build time is now responsible for all rendering, which means builds take slightly longer. But on a serverless platform, that's a reasonable tradeoff. Builds are infrequent relative to API calls, and they're already queued asynchronously. A few extra seconds during build is better than inconsistency and storage bloat at runtime.

Why This Matters for Microblogging

For a platform like Jottings, this architecture feels right. Users aren't publishing and immediately viewing—they're writing, then their content gets built into a static site. The separation between "storing content" and "publishing content" is intentional. The API layer is quiet and fast. The build system does the heavy lifting asynchronously.

It's a small change that made the system feel more coherent. Content flows in one direction: user creates markdown → API stores it → build system publishes it. No backtracking, no inconsistency, no redundancy.

That's the kind of architecture I want to maintain as Jottings grows.

Building something that processes user content? Think about where HTML (or JSON, or any rendered output) actually lives. Sometimes the simplest approach—store raw data, generate presentation on demand—is the cleanest one.