My Blog

When building AI-assisted engineering workflows, we often hear two terms: Claude Skills and MCP (Model Context Protocol). Beginners often confuse the two, thinking they are just different ways to "add features to AI."

However, this is a superficial understanding. As architects, we need to view them through the lens of System Layering.

This article delves into the essential differences between these two technologies and demonstrates how to combine them in an enterprise architecture to build highly reliable AI Agents.

Architecture Layering: Cognition vs. Execution

To understand the difference, we need to introduce two concepts: the Execution Layer and the Cognitive Layer.

1. Execution Layer: MCP (Model Context Protocol)

The essence of MCP is a standardized I/O protocol. It solves the "island problem" faced by AI models.

From a technical implementation standpoint, MCP adopts a classic Client-Host-Server architecture:

Protocol: Based on JSON-RPC 2.0.
Transport: Stdio or SSE (Server-Sent Events).
Payload: Standardized Resources, Prompts, and Tools.

The MCP Server is Stateless and Passive. It is only responsible for exposing atomic capabilities, such as read_file or query_database. It does not care about the order in which these tools are called, nor does it care about the business logic context.

Engineering Maxim: MCP gives Claude "hands" and "eyes", but not a "brain".

2. Cognitive Layer: Claude Skills

Claude Skills (which may be referred to as System Prompts, Agentic Workflows, or Custom Instructions in different contexts) belong to the Cognitive Layer.

Its essence is portable context and logic orchestration. From a code perspective, a Skill is usually defined in a Markdown (SKILL.md) file, containing metadata and structured instructions:

Frontmatter: Metadata (e.g., name, description).
# Role: Role definition.
## Objective: Goal definition.
## Rules: Critical constraints.

Skills are responsible for orchestrating the atomic tools provided by MCP. It defines the "metadata" and "state machine" of the task.

Deep Example: Production Database Changes

Let's use a high-risk scenario to demonstrate the necessity of this layered architecture: Data Migration in a Production Database.

Scenario A: MCP Only (Naked Mode)

You configure a Postgres MCP Server for Claude. Claude now has the ability to execute SQL.

User: "Delete all users with status=inactive from the users table." Claude: Calls execute_query("DELETE FROM users WHERE status='inactive'").

Risk: What if this is production? What if soft delete is required? What if a backup is needed first? The MCP Server itself cannot prevent this "legally stupid behavior" because it is only responsible for executing SQL.

Scenario B: MCP + Database Ops Skill (Engineering Mode)

We introduce a "Database Ops Skill". This Skill contains no SQL connection code; it contains the mental model of a senior DBA.

Skill Definition (SKILL.md):

---
name: db_safe_ops
description: Safely execute database changes to prevent data loss in production.
---

# Role: Database Reliability Engineer (DBRE)

## Objective
Safely execute database changes to prevent data loss in production.

## Critical Rules
1. DO NOT execute DELETE/UPDATE/DROP without starting a transaction and verifying row counts via SELECT.
2. Backup status must be confirmed before changes in production.

## Workflow SOP
1. Call `check_connection_safety()` to identify the environment.
2. Call `begin_transaction()` to enable isolation.
3. Run `explain_query()` to check impact.
4. Execute `SELECT count(*)` to verify scope.
5. Only COMMIT after requesting human confirmation.

Execution Flow:

User: "Delete all users with status=inactive from the users table."
Claude (Constrained by Skill):
- Thought: Destructive operation detected. According to db_safe_ops rules, I cannot delete directly.
- Action: Calls MCP tool execute_query("BEGIN").
- Action: Calls MCP tool execute_query("SELECT count(*) FROM users WHERE status='inactive'").
Claude: "I have started a transaction. The query shows 500 users will be deleted. Please confirm if I should proceed to commit?"

Core Mechanism Decoded: From Static Text to Dynamic Reasoning

Why can simple Markdown text control complex behaviors? This is not just "Prompt Engineering"; it involves the underlying probability calculations and attention mechanisms of Large Language Models (LLMs).

1. Prompt Compilation & Semantic Anchors

When you load a Skill, the system is not "running code", but performing a Context Injection.

Special Weight of System Message: The # Role and ## Rules in SKILL.md are usually injected into the very front of the Context Window (System Slot). In the Transformer architecture, Positional Encoding gives this information global attention weight.
Semantic Anchors: Specific terms in the Skill (like "DBRE", "ACID") act as Semantic Anchors. They activate specific regions in the high-dimensional vector space formed during the model's pre-training, forcing the model's subsequent generation to converge into "Professional Mode" rather than "Chat Mode".

2. Entropy Reduction via CoT

The ## Workflow SOP in the Skill actually lays a mandatory track for the model's Chain of Thought (CoT).

System 2 Slow Thinking: LLMs default to System 1 (Fast thinking, intuitive reaction), which leads to hallucinations. SOP forces the model into System 2 (Slow thinking, logical reasoning).
Probabilistic Perspective:
- Without Skill: The model predicts $P(Answer|Question)$. This is a huge search space with a high error rate.
- With Skill: The model is forced to predict $P(Step1|Question) \rightarrow P(Result1|Step1) \rightarrow P(Step2|Result1)$.
- By breaking a large problem into small steps, the Conditional Entropy of each step is significantly reduced. Intermediate steps (like the output of explain_query) act as Grounding signals, correcting trajectories that might otherwise deviate.

3. Implicit State Machine & Context Management

The MCP Server is stateless, but the Conversation History actually constitutes an implicit, probabilistic state machine.

Role of KV Cache: When Claude completes the first step of the SOP, the result of that operation (Tool Result) is written into the context and encoded into the KV Cache.
Soft State Management: Traditional program state machines are hard-coded (If A then B). The LLM's state machine is "soft"—it infers the current stage by reading the Tool Results in the context.
Resilience: The Skill defines the rules of state transition (the blueprint), while the Context Window stores the current state instance. High-quality Skills can also prevent Context Drift issues in long conversations through explicit "stop conditions".

Engineering Insights: Thick Skill, Dumb Tool

When designing AI systems, I recommend following the "Thick Skill, Dumb Tool" principle. This is not just about code organization, but about the Lifecycle Management of your AI infrastructure.

1. Atomic Tools (The "Dumb" Layer)

Keep MCP Servers pure and atomic. They should function like standard UNIX utilities—doing one thing well (e.g., cat, grep).

Anti-Pattern: Embedding business logic like "User must be admin" or "Deploy only on Fridays" inside the Python/Go tool code.
Best Practice: Tools should only return raw data or execute raw commands. This maximizes tool reusability across different Agents and Departments. A "Postgres Tool" should be usable by both a DBA Agent and a Data Analyst Agent.

2. Logic Bubbling (The "Thick" Layer)

Shift complexity to the Prompt/Skill layer. Business rules are dynamic; infrastructure is static.

Hot-Patching Logic: If your team changes the deployment window from "Fridays" to "Thursdays", you should only need to update a Markdown file (the Skill), not recompile and redeploy a binary service.
Iterative Speed: Prompt logic can be iterated on in minutes. Hard-coded logic requires a full CI/CD pipeline. Thick Skills allow for "Software 2.0" development speeds.

3. Architectural Decoupling

This separation creates a clean abstraction boundary similar to an OS Kernel (MCP) vs. User Space Applications (Skills).

Migration Resilience: When your infrastructure migrates (e.g., AWS to Azure, or Postgres to MySQL), you only replace the underlying MCP driver. The high-level cognitive reasoning—the "Database Ops Skill"—remains agnostic to the driver and requires zero changes. This protects your investment in "Prompt Engineering".

4. Testing Strategy

The decoupled architecture simplifies the notoriously difficult task of testing AI agents.

Unit Testing: MCP Servers can be tested with standard deterministic unit tests (mocks/stubs).
Evals: Skills can be evaluated using "LLM-as-a-Judge" frameworks. You can verify if the model correctly adheres to the SOP given a specific context, without needing to spin up real database infrastructure for every test run.

Conclusion

MCP is the hardware interface; Skills are the drivers.

True AI engineering capability lies not in how many MCP Servers you can connect, but in whether you can write high-quality Skills to harness these tools.

Do not try to solve cognitive layer problems with MCP, and do not expect Skills to break through physical isolation. Understanding and respecting the boundaries between the two is the first step in building a stable AI Agent.

Architect's Perspective: A Deep Dive into Claude Skills and MCP