Setting Up robots.txt for Search Engines and AI

When you publish something on the internet, you're making a choice about who gets to see it. That choice doesn't just apply to your readers—it also applies to search engine bots, AI training crawlers, and other automated visitors to your site.

That's where robots.txt comes in.

If you've never thought about it before, don't worry. Most people haven't. But if you care about search visibility, content ownership, or which AI models can learn from your writing, it's worth understanding the basics.

What Does robots.txt Actually Do?

Think of robots.txt as a polite "rules of the house" file for bots. It sits in the root of your website and tells automated crawlers what they can and can't access.

Here's the important part: it's not a security mechanism. Any bot can choose to ignore it. But the major search engines (Google, Bing, DuckDuckGo) and reputable AI crawlers respect these rules. It's an honor system that mostly works.

When a bot visits your site, the first thing it does is check for robots.txt to see if there are any restrictions. If you haven't set it up, the bot assumes everything is fair game.

The Robots.txt File Format (The Simple Version)

You don't need to write this yourself in Jottings—we generate it automatically. But it helps to understand what's happening behind the scenes.

A basic robots.txt looks like this:

User-agent: *
Allow: /
Disallow: /admin

Breaking it down:

User-agent: Which bot this rule applies to (* means all bots)
Allow: What the bot CAN access
Disallow: What the bot CANNOT access

You can have different rules for different bots. For example:

User-agent: GPTBot
Disallow: /

User-agent: Googlebot
Allow: /

This says: "GPTBot, please don't crawl anything. Googlebot, you can crawl everything." Helpful, right?

What Jottings Does Automatically

When you set up a Jottings site, we create a robots.txt file for you with sensible defaults:

User-agent: Googlebot
User-agent: Bingbot
User-agent: DuckDuckGobot
Allow: /

User-agent: GPTBot
User-agent: CCBot
User-agent: anthropic-ai
Disallow: /

Why these defaults? Because most people want search engines to find their content (that's how you get readers), but they're uncertain about AI training crawlers. This configuration gives you both: SEO visibility and AI crawler control.

Your AI Crawler Control Settings

Jottings gives you a simple setting to customize this: Allow AI crawlers in your site settings.

When you toggle this on:

GPTBot (OpenAI), anthropic-ai (Anthropic/Claude), CCBot (Common Crawl), and other AI training bots can crawl your content
Your content can be used to train future AI models
This helps these models improve and become more knowledgeable

When you toggle it off (the default):

AI crawlers are blocked from accessing your content
Your microblog remains a "training-data-free zone"
Search engines still get full access

It's a straightforward choice about your content's future.

Why Would You Allow AI Crawlers?

You might think "Of course I'll block AI crawlers!" But there are genuine reasons someone might allow them:

You want your ideas to influence AI models. If you're writing about niche topics, research, or unique perspectives, having that content in AI training data means future models will have access to your thinking. That's a legacy worth considering.

You believe AI should be trained on diverse human writing. Some people think preventing AI from learning from human content slows down progress and makes models less representative.

You're not concerned about copyright issues. AI training is currently a gray legal area. If you're comfortable with this usage, you can opt in.

You want maximum discoverability. Some bots that respect robots.txt might be benign and helpful. Blocking all AI crawlers is an "all or nothing" approach.

Why Would You Block Them?

On the flip side, plenty of reasons to keep the default setting:

You value content ownership. Your microblog is your writing. You might reasonably want control over where it ends up and how it's used.

You're concerned about copyright and usage rights. The legal status of AI training data is murky. Playing it safe by opting out makes sense.

You don't want your casual posts in an AI model. Not everything you write deserves to be permanent. AI training data can last for years. You might prefer less permanence.

It's the safer default. Why allow it unless you have a specific reason to?

We Also Generate ai.txt

Beyond robots.txt, Jottings creates an ai.txt file for you too. This is a newer standard specifically for AI and machine learning crawlers. It gives you more detailed control over how different types of bots interact with your content.

Your ai.txt includes information about your site, contact methods, and policies around AI usage. It's more conversational than robots.txt—a way to communicate directly with AI providers about your preferences.

You don't need to do anything with this file. We generate it automatically based on your site settings.

Checking Your Own Rules

Want to see what robots.txt you're generating? Just visit:

https://yoursite.jottings.me/robots.txt

You'll see the actual file that bots are reading. Good for debugging if something feels off.

The same goes for ai.txt:

https://yoursite.jottings.me/ai.txt

The Bigger Picture

Here's what I think matters: you should be conscious about this choice.

A lot of people publish on the internet without ever thinking about what automated systems do with their content. They don't know if Google is indexing their site (great for reach) or if Claude trained on their posts (unclear how you feel about it).

Jottings tries to make this explicit. We generate these files for you automatically, but we also let you control the rules. That's respect for your content and your choices.

Whether you allow AI crawlers or not, you're making an informed decision. And that's what matters.

Getting Started

To adjust your AI crawler settings in Jottings:

Go to your site settings
Find the "AI Crawler Permissions" option
Toggle "Allow AI crawlers" on or off
Save your changes

Your robots.txt and ai.txt files update immediately. Search engines will see the changes within a few days as they re-crawl your site.

That's it. No complicated setup, no file editing, just a simple choice about your content and the bots that visit it.

If you're publishing a microblog and thinking about SEO, content ownership, and how your writing fits into the broader internet ecosystem, these are good questions to be asking. Jottings is built for people who think about these things—and respect their readers enough to be intentional about them.

Want a platform that handles this automatically while giving you full control? Jottings is here.