A Skill Is a Folder, Not a Prompt: What Anthropic Learned Running Hundreds of Them

TL;DR

Anthropic says its engineering teams have run hundreds of Claude Code Skills and found that the most useful ones package instructions, scripts, references and guardrails into reusable folders. The company says verification Skills had the largest effect on output quality, though best practices are still developing.

Anthropic says its engineering teams have used hundreds of Claude Code Skills to turn repeated AI-agent instructions into reusable, versioned folders that can include scripts, references, templates and guardrails, a shift the company says can make agent work more consistent and easier to improve over time.

The development was described in “Lessons from building Claude Code: How we use skills”, a June 3, 2026 Claude blog post by Thariq Shihipar, a Claude Code engineer. According to the write-up, a Skill is not just a saved prompt in Markdown; it is a folder an agent can discover, read and run.

Anthropic says a Skill can include a root SKILL.md file, deeper reference material, executable scripts, reusable assets, configuration files, hooks and memory. The company’s model is based on progressive disclosure: the agent reads the root instructions first, then pulls in more detailed material only when the task calls for it.

After cataloging its internal Skills, Anthropic said they fell into nine categories, including API references, product verification, data analysis, business-process automation, scaffolding, code review, CI/CD, runbooks and infrastructure operations. Anthropic said verification Skills, which check work rather than produce it, had the largest measured effect on output quality.

At a glance
reportWhen: Anthropic blog published June 3, 2026;…
The developmentAnthropic published lessons from using hundreds of Claude Code Skills across its engineering organization, arguing that Skills work as reusable folders rather than saved prompts.
AI Dispatch · Insights · 1 July 2026

A Skill is a folder, not a prompt

Anthropic published what it learned running hundreds of Skills across its own engineering org. Read as a business memo, the point is bigger than a coding trick: this is how ad-hoc prompting becomes durable institutional capability — the SOPs your agents actually follow, versioned and shared.

✕ The misconception

“A Skill is just a clever markdown prompt you save in a file.”

✓ What it actually is

A folder the agent can discover, read & run — instructions, scripts, references, templates, config & on-demand hooks.

Anatomy of a Skill — the file system is context engineering
my-skill/the unit you share & version
├─ SKILL.mdroot instructions + a description written for the model (its trigger)
├─ references/deep detail pulled in only when needed — progressive disclosure
├─ scripts/real code, so the agent composes instead of rebuilding boilerplate
├─ assets/templates & files to copy into the output
├─ config.jsonsetup the agent asks for if it’s missing (e.g. which Slack channel)
└─ hooks + memoryon-demand guardrails + an append-only log so it remembers
Why it matters: the folder itself is the knowledge base. The agent reads the root, then reaches deeper only when the task demands it — the same way you’d hand a new hire a one-pager that points to the detailed docs.
The nine types — a gap-analysis map for your own library
1Library / API reference
2Product verification ★ top impact
3Data fetching & analysis
4Business-process automation
5Code scaffolding & templates
6Code quality & review
7CI/CD & deployment
8Runbooks
9Infrastructure operations
By Anthropic’s own measurement, verification Skills — the ones that check the work — moved output quality the most. If you build one category well, build that one.
The craft — what separates a good Skill from a useless one
Gotchas = highest-signal section Describe for the model, not humans (it’s the trigger) Don’t state the obvious Ship scripts, not just prose On-demand guardrail hooks (/careful, /freeze) Let it remember (log / SQLite) Don’t railroad — leave room to adapt
The take

The knowledge of how your organization actually operates can be captured, versioned, shared & executed — and the thing capturing it is a humble folder with a script and a gotchas list inside. For the builder, that’s context engineering with real tools attached. For whoever owns the budget, it’s the difference between AI that starts from zero every morning and an asset that compounds. Caveats: best practices are still evolving, checked-in Skills cost context, and curation beats accumulation. Start with one Skill, one gotcha, and the category that catches your mistakes.

Source: “Lessons from building Claude Code: How we use skills,” Thariq Shihipar (Anthropic), Claude blog, 3 June 2026. Categories, examples & measured claims are Anthropic’s; framing is the author’s. Docs: code.claude.com/docs/en/skills.
thorstenmeyerai.com

Reusable Instructions Become Team Assets

The report matters because it frames AI-agent guidance as operational infrastructure, not a set of one-off prompts. If the approach works as Anthropic describes, teams can package the way they review code, test products, deploy services or handle incidents into shared folders that agents can apply repeatedly.

That could affect how companies manage institutional knowledge. Instead of relying on private prompt habits, scattered wiki pages or repeated manual instructions, a team could maintain a versioned Skills library that changes as edge cases appear. Anthropic’s claim is that these units can become compounding assets, though the amount of maintenance required will vary by team.

From Scripting To Systems: A Practical Guide to Using AI Workflows That Save Time, Reduce Errors, and Make You the Go-To Tech Expert

From Scripting To Systems: A Practical Guide to Using AI Workflows That Save Time, Reduce Errors, and Make You the Go-To Tech Expert

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Claude Code Skills Take Shape

Skills are part of the broader push to make coding agents more reliable in real engineering settings. The source material contrasts two approaches: repeatedly telling an agent how to behave each day, or capturing that knowledge once in a reusable package that can be shared and updated.

The Thorsten Meyer AI Dispatch, published July 1, 2026, interprets Anthropic’s post as a business memo as much as a developer guide. Its central reading is that Skills can function like standard operating procedures for agents, with documents, templates and code stored in the file system rather than buried in conversation history.

The most practical guidance in the source material is narrow: start with one Skill, include at least one hard-won gotcha, and give priority to the category that catches mistakes. Based on Anthropic’s reported measurement, that points many teams first toward verification workflows.

“Lessons from building Claude Code: How we use skills”

— Thariq Shihipar, Anthropic

AI Engineering: Building Applications with Foundation Models

AI Engineering: Building Applications with Foundation Models

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Skill Quality Still Varies

Several details remain open. Anthropic has not provided, in the supplied material, a full public breakdown of the hundreds of internal Skills, the exact measurement method behind its quality claims, or how results differed across teams and task types.

It is also unclear how easily the approach transfers to organizations with weaker documentation habits, strict security rules or fast-changing internal systems. The source material cautions that best practices are still developing, that checked-in Skills can add context cost, and that curation matters more than accumulation.

Google Apps Script for Workflow Automation: Build Real-World Business Automation Systems Using AI-Assisted Learning, Triggers, APIs, and Enterprise Workflow Design

Google Apps Script for Workflow Automation: Build Real-World Business Automation Systems Using AI-Assisted Learning, Triggers, APIs, and Enterprise Workflow Design

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Teams Test Smaller Libraries

The next step for readers is likely experimentation rather than wholesale adoption. Anthropic’s advice, as summarized in the source material, is to begin with one narrow Skill, add real scripts where possible, write triggers for the model rather than humans, and record known failure cases.

For engineering leaders, the near-term test is whether a Skills library reduces repeated instruction, improves review or verification quality, and stays maintainable as it grows. The clearest milestone will be whether teams can show measurable gains from specific Skills, especially in product verification and code-quality workflows.

Visual Studio Code USER GUIDE For Beginners 2026: Learn Project Setup, Code Editing, Testing, Version Control, Productivity Tools, and Modern Development Workflows

Visual Studio Code USER GUIDE For Beginners 2026: Learn Project Setup, Code Editing, Testing, Version Control, Productivity Tools, and Modern Development Workflows

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What did Anthropic publish?

Anthropic published a June 3, 2026 Claude blog post by Thariq Shihipar describing how its engineering teams use Skills in Claude Code.

What is a Claude Code Skill?

According to Anthropic’s description, a Skill is a folder that can contain instructions, scripts, references, templates, configuration and hooks that an agent can read or run.

Which Skills had the biggest reported effect?

Anthropic’s reported measurement, as summarized in the source material, found that verification Skills had the largest effect on output quality.

Is this only useful for developers?

The immediate example is AI coding agents, but the broader claim is about capturing repeatable work processes as shared, versioned assets.

What remains unproven?

The supplied material does not show the full dataset, measurement method or outside replication. It remains unclear how well large Skill libraries will scale across different organizations.

Source: Thorsten Meyer AI

You May Also Like

Grimfaste: Operations for a Fleet

Thorsten Meyer AI announced Grimfaste, a hosted operations platform for monitoring publisher site fleets and link health.

AI output review queue for customer support macros

Support teams are testing a new AI output review queue for customer support macros to ensure policy compliance and tone accuracy before publication.

Warranty claim packet builder for appliance repair shops

A new workflow tool for independent appliance repair shops to streamline warranty claims is entering testing, aiming to improve documentation and reduce rework.

Singapore: Engineer the Transition

Thorsten Meyer AI’s Post-Labor Atlas says Singapore relies on skills, wage ladders, savings and state capacity to manage AI-era labor risk.