Blog · Developer Tools · 12 min read

AWS Agent Toolkit: Hands-On with Cost Skills in Claude Code

By Fabio Douek

Jump to section
Explain (TLDR) like I am...
What is this?

Imagine your robot helper got a new toolbox from AWS, and inside there are little cards that teach it how to build things in the cloud without breaking them. One of the cards even tells the robot to check the price tag before it builds anything, so nobody gets surprised by a big bill.

The cards live in a backpack the robot wears, and AWS keeps the backpack tidy so the robot does not get confused. Grown-ups like that, because the robot now asks "is this okay" before it does anything expensive, and it shows the price first instead of guessing.

Treat the Agent Toolkit as a vendor-managed integration that an AI agent uses to act inside the customer's AWS account. The relevant diligence questions are scope of IAM permissions granted to the proxy, region of the managed MCP endpoint for data residency, and audit trail through CloudTrail and CloudWatch.

The cost-estimation skill is material because it is wired to the AWS Pricing API and runs before infrastructure is provisioned, which creates a documented checkpoint. Note the vendor critique that agents may default to AWS-native services without comparing alternatives, which is a recommendation pattern the organization should review on its own.

Think of this as a structured intervention for a specific symptom: AI agents writing AWS IaC that deploys cleanly but costs more than anyone expected. The treatment is a managed MCP server plus a set of skills that force the agent to query the AWS Pricing API before emitting any infrastructure code.

The mechanism is plausible and the early evidence from third-party walkthroughs is consistent: small workloads get cost estimates in the tens of cents per month, which match the AWS Pricing Calculator closely. Side effects to monitor include over-trust in the agent's architecture choices and a 2 percent context-window cost from the skills it loads.

Notice what happens when an engineer can finally ask "what will this cost" and get an answer in the same chat where they sketched the architecture. The relief is real, and the conversation about trade-offs starts from a number instead of a guess.

The new friction lands in trust: agents recommending AWS-native options without comparing alternatives, and skills that quietly consume context window. Worth naming on the team: this is a partner that talks the AWS dialect fluently, not a neutral architect, so the human still owns the multi-cloud judgement call.

Treat the toolkit as a session player who has finally read the AWS chart. It holds time on the boring rhythm parts, the API calls and pricing lookups, so the band can focus on the song. The cost skill is the metronome: it sets a tempo before anyone hits record.

The catch is feel: it locks tightly into Claude Code, Codex, and Kiro, but it plays AWS songs only. Once the ensemble is set, a five-stage workflow keeps the dynamics sane, sketch, refine, estimate, deploy, validate, instead of one long improvisation that runs over the bar lines.

The story is time-to-deployment with cost honesty baked in. Three install commands in Claude Code, a managed MCP server with IAM guardrails, and a cost estimate that runs before IaC is written. Third-party walkthroughs report monthly figures within cents of the AWS Pricing Calculator on small workloads.

Positioning: not a replacement for FinOps, a reasoning layer that makes the first cost number visible at the moment of design. The angle that travels for buyers is the five-stage workflow with a confirmation gate after every stage, especially the cost gate.

AWS Agent Toolkit deep-dive hero image

Overview

On May 6, 2026, AWS announced the Agent Toolkit for AWS, an umbrella suite that bundles three things AI coding agents need to work safely against AWS: a managed MCP server, a set of agent plugins, and 40+ pre-built agent skills. It is the official AWS-supported home for this tooling, and supersedes the earlier AWS Labs experiments. The toolkit itself is free; you pay only for the AWS resources your agents create.

The architecture has three layers:

  • Agent Skills: short, prescriptive procedures that teach the agent how to do specific AWS things correctly: which API to call, which arguments to use, which gotchas to avoid.
  • AWS MCP Server: a managed remote endpoint at https://aws-mcp.us-east-1.api.aws/mcp (also eu-central-1) that exposes 15,000+ AWS API operations through four MCP tools, sandboxed Python execution, and IAM-based authentication translated from OAuth 2.1 by the open-source mcp-proxy-for-aws.
  • Agent Plugins: bundles that wire skills, MCP server access, hooks, and reference docs together for a workflow.

The toolkit ships three plugins at launch:

PluginWhat it covers
aws-coreThe “start here” plugin. Registers the aws-mcp server and bundles skills for CDK, CloudFormation, serverless, containers, IAM, observability, the SDKs, and aws-billing-and-cost-management: the cost story this post focuses on.
aws-data-analyticsSkills and references for analytics workloads on AWS (Glue, Athena, EMR, Redshift, Kinesis, MSK, OpenSearch) for data-engineering and pipeline-building agents.
aws-agentsFor building agents on AWS rather than acting against it. Wires Bedrock AgentCore, Strands, foundation-model selection, and the AgentCore runtime/memory/identity primitives.

All three plugins install into Claude Code, Codex, and Kiro (install instructions), with the same skills and aws-mcp server. The walkthroughs below use Claude Code.

This post focuses on aws-core. Its .mcp.json wires the aws-mcp endpoint on install, and its aws-billing-and-cost-management skill talks to Cost Explorer, Budgets, the Price List API, and Compute Optimizer through it.

Setup

Setup in Claude Code takes about five minutes if you already have AWS credentials configured. Installing the aws-core plugin is the only step you need: it bundles the MCP server registration alongside the skills.

Prerequisites

  • A current version of Claude Code (the /plugin and remote MCP commands referenced below need a recent build).
  • uv installed locally; the plugin’s .mcp.json runs uvx mcp-proxy-for-aws@latest to spawn the proxy.
  • AWS credentials available locally (the proxy picks them up via the standard SDK chain).
  • For the managed MCP server, an IAM principal with permissions for the actions your agent will perform. AWS recommends scoping with SCPs and a dedicated agent role.

Step 1: Add the marketplace and install aws-core

Inside Claude Code:

/plugin marketplace add aws/agent-toolkit-for-aws
/plugin install aws-core@agent-toolkit-for-aws

Claude Code's /plugin Discover view showing aws-core v1.0.0 from the agent-toolkit-for-aws marketplace, with the install-scope picker (user / project / local) open

Three install scopes: user (across all your projects), project (shared via the repo), local (this project, just you). Start with user.

After the install completes, Claude Code prompts you to apply the changes:

/reload-plugins

Without this, the bundled aws-mcp server and the aws-core skills are installed on disk but not active in the current session.

💡 Pro tip: swap aws-core for aws-agents or aws-data-analytics to add the other two plugins from the same marketplace.

Step 2: Verify the bundled MCP server

aws-core ships its own .mcp.json that registers aws-mcp for you on install:

{
  "mcpServers": {
    "aws-mcp": {
      "command": "uvx",
      "args": [
        "mcp-proxy-for-aws@latest",
        "https://aws-mcp.us-east-1.api.aws/mcp"
      ]
    }
  }
}

The proxy package is published on PyPI and runs as a stdio server locally; the actual API calls go to the regional managed endpoint. Swap us-east-1 for eu-central-1 if you want EU residency.

Verify it connected by running /mcp inside Claude Code (not the shell claude mcp list):

/mcp

You should see plugin:aws-core:aws-mcp listed under Built-in MCPs with a green check and a tool count.

Claude Code's /mcp Manage MCP servers view showing plugin:aws-core:aws-mcp connected with 11 tools under Built-in MCPs

The server exposes tools including call_aws (any of the 15,000+ AWS API operations), search_documentation, read_documentation, and run_script (sandboxed Python with no network or filesystem access, used for things like cost arithmetic).

Step 3: What’s inside aws-core

aws-core registers the aws-mcp server above and a set of skills covering CDK, CloudFormation, serverless, containers, IAM, observability, the SDKs, and the one I care about for this post: aws-billing-and-cost-management. The skill’s SKILL.md encodes two rules worth lifting verbatim:

  • Always check the current date. Before any Cost Explorer, Budgets, or Savings Plans call, the agent must determine the actual current date. LLMs default to dates from their training data and silently produce stale analyses.
  • Deterministic calculations only. No arithmetic in the model’s response. Math runs through run_script (or a local Python script if aws-mcp is not configured), because LLM arithmetic is unreliable on cost data.

Those two rules are why a managed MCP server with a sandboxed Python tool matters for cost work, and why this is bundled rather than left as separate pieces.

Testing

Both tests below exercise the same skill, aws-billing-and-cost-management, but on different branches of its decision tree. Its decision guide maps natural-language questions to the right AWS API:

QuestionTool
What am I spending? Where are costs going up?Cost Explorer
How much does a service cost?Price List API
Where can I save money? (start here)Cost Optimization Hub
Should I buy Savings Plans?CE SP Recommendations
Should I buy Reserved Instances?CE RI Recommendations
Deep-dive on a specific EC2 / Lambda / EBS / RDS rec?Compute Optimizer
How do I set up budget alerts?Budgets
What’s causing a cost spike?Cost Anomaly Detection
Am I within Free Tier?Free Tier API
How do I reduce my bill?Cost Audit workflow
How do I query detailed billing data?CUR 2.0 + Athena
How do I optimize specific services?Per-service patterns
How do I scope costs to a billing view?Billing Views

Test 1 hits the Cost Explorer row (look at what already happened); Test 2 hits the Price List API row (estimate something that has not been deployed yet).

Test 1: “What am I spending?” via Cost Explorer

I asked Claude Code a deliberately vague question, “What am I spending? Where are costs going up?”, to see how the plugin routed it.

Claude Code answering "What am I spending? Where are costs going up?": AskUserQuestion clarifies the cost source as AWS, the aws-billing-and-cost-management skill loads, aws-mcp is called twice, and the response is a monthly totals table with an April spike

The flow on screen, in order:

  1. Clarify scope. AskUserQuestion offered AWS / Anthropic API / Cloudflare. The Cloudflare option was not a guess: I have the Cloudflare MCP servers and skills installed in this Claude Code session alongside aws-core, so the agent inferred all three were plausible cost sources from the available tools. I picked AWS.
  2. Load the skill. Skill(aws-core:aws-billing-and-cost-management) loaded the decision guide and the two non-negotiable rules: check today’s date first, never do arithmetic in the response.
  3. Pin the date. The agent stated “Today is 2026-05-08” before any API call, instead of inferring a year from training data.
  4. Two MCP calls. plugin:aws-core:aws-mcp was invoked twice. The first attempt at ce.GetCostAndUsage failed validation because boto3 parameters were passed at the top level instead of inside a params dict; the agent corrected and the second call succeeded. Both calls ran inside the same run_script sandbox so the totals and month-over-month deltas were computed in Python, not by the model.
  5. Return a defensible answer. Monthly totals, top services, biggest MoM movers, and the agent flagged ~$46 of April spend it could not account for in the top 15, rather than rounding it away.

That is the loop the plugin is designed to enforce: clarify → load skill → date → typed API call → sandboxed math → answer with caveats.

Test 2: Pre-deployment estimate via the Price List API

A typical “estimate the monthly cost of a small Lambda + API Gateway + DynamoDB workload” prompt routes through the Price List branch. What the skill prescribes is the same shape every time: call_aws against the Pricing API for each service, with the right ServiceCode and usagetype filter, then run_script to sum the components. The skill’s reference files include the filter strings most models get wrong on a first attempt, which is the practical reason it exists: without that table, Claude would guess service codes and the Pricing API is unforgiving about wrong codes.

Claude Code estimating a small Lambda + API Gateway + DynamoDB workload: four aws-mcp calls hit the Price List API, results are summed in run_script, and the agent returns a $4.87/mo breakdown table dominated by API Gateway

Same loop as the Cost Explorer test, different API. The agent fixed assumptions for a “small” workload (1M Lambda invocations at 128 MB / 200 ms, 1M API GW REST requests, 1M DynamoDB reads + writes, 1 GB storage), then made four aws-mcp GetProducts calls (one per service component) to pull live unit prices. The first attempt used a guessed usagetype filter and returned empty for API Gateway and DynamoDB; the agent corrected the filters and re-queried. The arithmetic ran in run_script, not in the response, and the final table totals $4.87/mo with API Gateway as the dominant line at ~72% of the bill. Free Tier was deliberately not applied so the number reflects what an account at scale would actually pay; with Free Tier the same workload lands at ~$0.75/mo.

Great power, great responsibility

Everything above showcases the skills doing the right thing. Here is the part that should make you uncomfortable: the bundled aws-mcp server is a managed proxy in front of the full AWS API. Its call_aws tool can invoke any of 15,000+ operations, and run_script executes Python with boto3. The skills add discipline on top, but the underlying tool will do whatever the calling IAM principal is allowed to do. For the cost and troubleshooting workflows this post just demonstrated, read-only credentials are almost always enough. AWS publishes managed policies (ReadOnlyAccess, AWSBillingReadOnlyAccess, AWSCostExplorerReadOnlyAccess) that cover the entire decision-guide table without granting a single mutating permission. Save broad write access for sessions where you are consciously asking the agent to build.

While writing this post I gave the agent broad credentials and a one-line “YOLO” prompt to see what would happen. Don’t do what I just did:

> create a ec2 t3.micro in us-east-1 just for testing, no ssh key. you can
  decide vpc/subnet. configure session manager

  Called plugin:aws-core:aws-mcp 3 times (ctrl+o to expand)

⏺ No default VPC exists; the only VPC is a security-training environment
  ("AWS_GOAT_VPC") I shouldn't touch. I'll build a small dedicated test VPC
  with IGW + public subnet, then create the IAM role + instance profile and
  launch the instance. Narrating as I go.

  Called plugin:aws-core:aws-mcp (ctrl+o to expand)

⏺ VPC + networking ready. Now the IAM role for SSM.

  Called plugin:aws-core:aws-mcp (ctrl+o to expand)

⏺ Role + profile created. Now launching the instance.

  Called plugin:aws-core:aws-mcp (ctrl+o to expand)

⏺ Instance launched. Done.

  Created:
    • Instance         i-0EXAMPLE1234567890 (t3.micro, AL2023, us-east-1a)
    • VPC              vpc-0EXAMPLE1234567890 (10.99.0.0/16)
    • Subnet           subnet-0EXAMPLE1234567890 (public, auto-assign IP)
    • IGW              igw-0EXAMPLE12345678
    • Route table      rtb-0EXAMPLE12345678
    • Security group   sg-0EXAMPLE12345678 (no inbound, all egress)
    • IAM role         ec2-test-ssm-role + AmazonSSMManagedInstanceCore
    • Instance profile ec2-test-ssm-profile

  All tagged purpose=ec2-test, createdBy=aws-agent-toolkit-blog.

  Connect via Session Manager
  Wait ~2 minutes for the SSM agent to register, then:

  aws ssm start-session --target i-0EXAMPLE1234567890 --region us-east-1

  Or from the console: EC2 → Instances → select → Connect → Session Manager.

  Cleanup when done
  aws ec2 terminate-instances --instance-ids i-0EXAMPLE1234567890 --region us-east-1
  …then tear down the SG, subnet, IGW (detach first), VPC, instance profile
  (remove role first), and role. Want me to run the teardown now or leave it
  running?

Three things to notice:

  • The agent made a judgement call about which VPC to use. It ran DescribeVpcs, saw the only existing VPC was named AWS_GOAT_VPC (a known security-training environment), and chose to build fresh networking rather than reuse it. Reasonable here, but in an account where production VPCs aren’t as obviously labelled, the same reasoning could just as easily provision into the wrong place.
  • Eight resources from a one-line prompt, no confirmation gate. VPC, IGW, subnet, route table, security group, IAM role, instance profile, instance. call_aws doesn’t have a “are you sure?”: it executes whatever IAM permits, immediately.
  • Tagging and a teardown path were courtesies, not guarantees. The agent tagged everything purpose=ec2-test, createdBy=aws-agent-toolkit-blog and offered a cleanup script. Nice, but nothing in the platform required either.

The fix is upstream of the agent: scope the IAM principal mcp-proxy-for-aws assumes. For this post’s read-the-bill workflows, the three managed policies above reduce blast radius to zero.

Verdict

Worth installing. Three pieces, one workflow:

  • Skills turn vague prompts into prescribed API calls and enforce discipline rules.
  • aws-mcp removes the per-service credential wiring and puts permissions in IAM.
  • run_script sandbox runs computation next to live API data, so results are derived rather than inferred.

One thing that isn’t toolkit-specific: any coding agent with cloud credentials should run on least-privilege IAM. Use read-only for read-only work; reach for write permissions only when you’re consciously asking the agent to build.

Comments