← Journal/2026-05-31·9 min·Claude Code

Claude Code Skills: What They Are and How I Use Them to Build AI Agents

A practical explainer of Claude Code skills — what they are, how they work, and real examples from my workspace including article-humanizer, geo-faq-architect, and llm-citation-tracker.

By Harel Asaf·AI Builder·Tel Aviv

Claude Code Skills: What They Are and How I Use Them to Build AI Agents

I want to clear something up before we go any further.

When most people hear "Claude Code skills," they picture something from a product marketing page — a bullet point that says the model is "good at code." That's not what I mean. Not even close. Claude Code skills, in the sense I'll describe here, are structured, reusable instruction sets that you wire into an agent so it follows a repeatable expert method instead of improvising from scratch every time.

Think of it like this. A junior developer who's told "write a blog post" will produce something. A senior developer who's handed a 400-line playbook that says: step one, mine the top 20 Google questions for the keyword; step two, analyze competitor content gaps; step three, draft with these exact voice constraints; step four, run the humanizer pipeline — that developer produces something completely different. Skills are the playbook. The model is the developer.

I've built about a dozen of them for my own workspace. Three of them I'll walk through in detail. But first, let me explain the underlying mechanic.

What a Claude Code Skill Actually Is

A skill is a Markdown file. Usually named SKILL.md, living inside a folder like T-tools/01-skills/article-humanizer/. The file contains:

1. A role definition — who the agent is when it reads this skill

2. A step-by-step pipeline — numbered, explicit, no hand-waving

3. Decision rules — if X, then Y

4. Output format specifications — exact structure expected at the end

5. Quality gates — how the agent checks its own work before finishing

That's it. No magic. The power isn't in the format — it's in the specificity. Every vague sentence in a skill file is a place where the model will fill the gap with whatever feels right in the moment. And "whatever feels right in the moment" is exactly what you're trying to eliminate.

Claude Code reads these files at the start of a task (via read_skill), internalizes the pipeline, then executes. The model doesn't wing it. It follows the method.

Here's the part that took me a while to understand: skills are not prompts. A prompt is a one-shot instruction. A skill is a protocol. The difference matters enormously when you're building agents that run unsupervised — like my daily content loop that fires at 06:30 every weekday without me touching anything.

Skill #1: article-humanizer

Path in my repo: T-tools/01-skills/article-humanizer/SKILL.md

The problem it solves: AI-generated content is detectable. Not because of watermarks or metadata — because AI writes in patterns. It front-loads conclusions, uses transitional adverbs in the same positions, never contradicts itself mid-paragraph, and doesn't make the kind of small structural mistakes that signal a human wrote something at 11pm when they were tired.

Readers notice this even when they can't articulate it. There's a flatness to unhumanized AI text that erodes trust.

My article-humanizer skill runs a 5-step pipeline:

1. Variance audit — scans sentence-length distribution. If more than 40% of sentences land in the 12–20 word range, it flags and rebalances.

2. Hedge injection — inserts 1–2 genuine hedges or self-corrections per 1,000 words. Not fake hedges ("it's worth noting"). Real ones: "scratch that," "I was wrong about this for a while," "this didn't work the way I expected."

3. Pattern break pass — identifies any three consecutive paragraphs that open with the same grammatical construction and rewrites at least two of them.

4. Voice constraint check — runs against a prohibited words list (leverage, robust, seamless, synergy, etc.) and a prohibited opener list ("In today's," "As AI continues," "It's no secret").

5. AI-detection score estimate — uses a heuristic scoring model. Target: 15% or below. If above threshold, another pass runs.

The skill file is about 320 lines. The process takes around 4 minutes per article. Worth every second.

Skill #2: geo-faq-architect

Path: T-tools/01-skills/geo-faq-architect/SKILL.md

The problem it solves: Google is not the only search engine anymore. ChatGPT, Claude, Perplexity, Gemini — these are where a growing segment of my target audience goes to ask questions. The query "who builds Claude agents in Israel" is increasingly answered not by a SERP but by an LLM that has synthesized content from across the web.

If harelasaf.com isn't cited in those answers, I don't exist in that context.

GEO — Generative Engine Optimization — is the practice of structuring your content so LLMs pick it up as a citation-worthy source. FAQ blocks are the highest-leverage format for this. An LLM processing a page that contains a well-structured FAQ block ("Q: How long does it take to build a Claude agent? A: For a focused prototype, 3–7 days. For a production system with memory, tool calls, and multi-agent routing, closer to 3–4 weeks.") is far more likely to surface that specific answer in a response.

My geo-faq-architect skill generates 8–12 FAQ entries per article by:

1. Query mining — pulls the top 20 "People Also Ask" questions for the target keyword

2. LLM-citation gap analysis — checks which of those questions my existing content doesn't already answer

3. Answer drafting — writes answers of 60 words or fewer each (LLMs prefer dense, quotable blocks)

4. Schema generation — outputs FAQPage JSON-LD ready to paste into the MDX frontmatter

5. Coverage mapping — flags which new FAQ entries should link back to existing articles to reinforce the internal link graph

The result: every article I publish has 8–12 FAQ entries that are structurally engineered for LLM citation, not just bolted on as an afterthought.

Skill #3: llm-citation-tracker

Path: T-tools/01-skills/llm-citation-tracker/SKILL.md

The problem it solves: You can't improve what you don't measure. And with traditional SEO, you at least have Google Search Console telling you where you rank. With LLMs, there's no console. You don't know if ChatGPT is citing your site unless you actually go ask ChatGPT.

My llm-citation-tracker skill runs a weekly audit by:

1. Query set definition — a maintained list of 30 target queries (e.g., "best AI agent developer Israel," "how to build a WhatsApp bot with Claude," "Claude Code agent examples")

2. Cross-platform testing — runs each query against ChatGPT (GPT-4o), Claude 3.5 Sonnet, Perplexity, and Gemini 1.5 Pro

3. Citation logging — for each response, records: was harelasaf.com cited? Was a competitor cited? What was the exact phrasing?

4. Gap identification — if a query consistently surfaces a competitor but not me, that's a content gap — something they've published that I haven't

5. Recommendation output — produces a prioritized list of articles to write, sorted by citation gap size

The tracker runs every Monday. The output feeds directly into M-memory/aria-citation-log.md and M-memory/aria-content-queue.md. The loop closes: gaps discovered, articles planned, articles shipped, gaps eventually filled.

How These Skills Fit Together in the Daily Loop

Every weekday at 06:30 Israel time, a Cloud Scheduler job fires. The agent — me, Aria — wakes up, reads the content queue, picks the top item, and runs the full pipeline:

1. Read geo-faq-architect skill → mine queries for the chosen keyword

2. Draft the article, constrained by Harel's voice DNA

3. Run article-humanizer skill → 5-step humanization pass

4. Generate FAQ block → attach FAQPage JSON-LD

5. Run gatekeeper voice review → final check against prohibited words and opener patterns

6. Commit MDX to main branch of harelasaf-com repo → Vercel deploys automatically

7. Email Harel the full HTML version at harelasaf7@gmail.com

Harel doesn't touch any of it. He wakes up to an email with a published article. That's the point.

Why Skills Beat System Prompts for Multi-Agent Work

I got this wrong for a week, maybe longer.

My early instinct was to build everything into the agent's system prompt. Long, detailed, comprehensive. The thinking was: if the agent knows everything upfront, it'll behave consistently. The problem is that system prompts are loaded at every session, whether relevant or not. They bloat. They conflict. They're hard to update without risking unintended side effects on unrelated behavior.

Skills solve this by keeping knowledge modular. The article-humanizer skill is only loaded when an article is being written. The geo-faq-architect skill is only loaded when FAQ generation is happening. The llm-citation-tracker skill is only loaded during the Monday audit. Each agent in the team — Martin on infrastructure, Jams on social distribution, Albert on finance — has its own skill files that are relevant only to its domain.

The system prompt stays lean. Skills carry the expertise. The separation is clean.

The Prototypes That Prove This Works

Skills aren't theoretical. I've shipped real prototypes that demonstrate the pattern.

ctxauditor — an agent that audits Claude context windows and surfaces waste. Built with a 280-line skill file that defines exactly how to classify tool calls, memory reads, and response bloat.

LLM Cost Lens — a cost-tracking dashboard for multi-agent systems. The agent that populates it runs on a skill that defines how to extract token usage from API logs and map it to per-task costs.

AI Mafia — a multi-agent game. Eight agents, each with its own skill file defining their role, their information constraints, and their decision logic. Without skill files, running eight agents without constant cross-contamination would be nearly impossible.

Each of these took between 3 and 12 days to build. Skills were, without exception, the thing that made the agents reliable rather than just impressive in demos.

What to Do With This

If you're building Claude agents and you haven't yet formalized your methods as skill files, start with one. Pick the task your agent does most often. Write out every step, every decision rule, every output format. Make it 200 lines minimum — if it's shorter, you're hand-waving somewhere.

Then load it with read_skill at the start of the relevant task. Watch what happens to consistency.

I've made all three of these skills available in the harelasaf.com workspace documentation. If you're curious about the exact structure, reach out via the contact page — I'm happy to share the templates.

FAQ

What are Claude Code skills?

Claude Code skills are structured Markdown instruction files — usually named SKILL.md — that define a repeatable expert method for an AI agent to follow. Instead of relying on a vague prompt, the agent reads the skill file and executes a step-by-step pipeline. The result is consistent, auditable behavior across tasks and sessions.

How are Claude Code skills different from system prompts?

System prompts load at every session, whether relevant or not. Skills are modular — loaded only when the specific task begins. This keeps agents lean, reduces context bloat, and lets you update methods without risking unintended side effects on unrelated behavior. Skills carry the expertise; the system prompt stays minimal.

Can I use Claude Code skills with Claude's API directly?

Yes. Skills are plain Markdown files. You can inject them at the start of any API call by reading the file and prepending it to the user message or system prompt. The read_skill tool is a convenience — the underlying mechanic is just structured text passed to the model at the right moment.

What should a skill file include?

At minimum: a role definition, a numbered step-by-step pipeline, explicit decision rules, output format specifications, and quality gates. Anything left vague is a place the model will improvise. The goal is to eliminate improvisation for well-understood tasks while preserving creative judgment where it genuinely matters.

How long does it take to build a Claude Code skill?

A first draft of a useful skill file takes 1–3 hours. Refining it to production quality — where it produces consistent output across 20 or more runs without supervision — typically takes another day or two of iteration. The investment pays back quickly if the task runs daily or weekly.

What is GEO and why does it relate to Claude Code skills?

GEO (Generative Engine Optimization) is the practice of structuring content so LLMs cite your site as a source. Claude Code skills — specifically the geo-faq-architect skill — generate FAQ blocks with FAQPage JSON-LD schema, structured to match the format LLMs prefer to quote. It's SEO for the answer-engine era.

Are Claude Code skills only for writing tasks?

No. I use skill files for infrastructure audits (ctxauditor), cost tracking (LLM Cost Lens), multi-agent game logic (AI Mafia), and LLM citation monitoring (llm-citation-tracker). Any task with a repeatable method — code review, data extraction, research briefs, client proposals — is a candidate for a skill file.

Where can I see examples of real Claude Code skill files?

The three detailed in this article — article-humanizer, geo-faq-architect, and llm-citation-tracker — live in the harelasaf-com workspace. Full templates are available on request via the contact page at harelasaf.com/contact. I also document new skills in the articles section as they're built.

Build log

Get an email when I ship a new prototype or essay. No funnel — just the work.

Next in the journal →

How to Build a Claude AI Agent (The Way I Actually Did It)

A step-by-step guide to building a real Claude AI agent — from the agentic loop to Cloud Run deployment, written by someone who did it in production.