Claude Code for QA.
A free Claude Code tutorial for QA engineers and SDETs — learn AI test automation, agentic testing workflows, Playwright MCP, and how to use Claude AI for testing across 46 hands-on chapters.
The 30-second pitch
Four things you walk away able to do tonight.
Flaky test?
Claude rewrites the wait, replaces waitForTimeout with expect.poll, runs it 20× to prove stability — before you finish your coffee.
Jira ticket vague?
Claude reads the AC, generates Gherkin scenarios + a Playwright skeleton, files the edge-case bugs it spots — straight to QA-942.
Visual bug from support?
Drag the screenshot. Claude infers viewport, route, OS. Reproduces, captures a Playwright trace, attaches it to the bug.
Migration anxiety?
Cypress → Playwright. 240 specs. Claude converts in one overnight run · you wake to a passing suite + a PR ready to review.
Claude Code is a coding agent that lives in your terminal — reads your repo, drives your browser, writes your tests, runs your suite, files the bug. For a QA, that flips the SDLC.
You stop being the bottleneck on the right and start being the brain on the left. You design the contract · the agent does the toil. Same job title · 10× leverage.
Where Claude Code sits in your stack.
Think of it as a terminal-native pair tester. It speaks files, shells, and — through MCP servers — browsers, Jira, Confluence, Notion, GitHub, calendars, even your inbox. Your job is to orchestrate, not to type.
Read
It scans the repo, the failing log, the screenshot, the Jira ticket.
Plan
It proposes a plan in plan-mode — you approve, edit, or redirect.
Act
Edits files, runs commands, drives Playwright, posts comments.
Verify
Runs the suite. Re-reads diffs. Confirms green before claiming done.
Ship
Commits, opens PR, deploys preview. You review one diff, not 40.
Setup · macOS / Linux / WSL
Five lines. One terminal.
# 1. install npm install -g @anthropic-ai/claude-code # 2. cd into your QA repo cd ~/work/qa-portfolio # 3. start a session claude # 4. first thing you say > /init → writes CLAUDE.md # 5. then ask anything > read the repo and tell me what test framework we use
A session is just a long REPL with a model that holds the full directory in its head. Anything you type with no leading slash is a prompt. Anything starting with / is a command. Anything starting with ! is a passthrough shell call.
A session has four moving parts.
Working dir
The folder you launched in. The agent will read files here. Treat it like the scope of one feature, not your whole laptop.
CLAUDE.md
Project rules, conventions, do/don't. Loaded every turn. Global one lives at ~/.claude/CLAUDE.md.
Read / Edit / Write / Bash
The four primitives. Everything else (Playwright, Jira, GitHub) is built on top via MCP servers.
Permission · Plan · Worktree
Control what runs automatically. Plan-mode = read-only thinking. Worktree = isolated branch sandbox.
Most QA workflows live entirely inside Read + Bash + a Playwright MCP. You almost never need root, and you almost never need to leave the terminal.
Three ways to talk to the agent.
Plain English
Free text. The agent decides which tools to use.
> run the login spec and screenshot
every failure
Slash command
Built-in or custom. Predictable, repeatable.
> /init > /review > /compact
Shell escape
Run any shell command directly, no agent.
> !npx playwright test --headed > !git status
Rule of thumb · use prompts for thinking, slash commands for workflows you repeat, and shell when you already know the exact command.
The complete cheat-sheet
Every slash command a QA touches.
QA-authored customs to add to .claude/commands/ · /flaky · /smoke · /bug-from-trace · /audit-locators · /gen-pom · /triage-failures · /quarantine · /report-run.
Need the full cheat-sheet?
The complete keyboard shortcuts · slash commands · config & env · skills & agents reference card lives in the Reference appendix at the end of the deck. Jump there anytime — print it, pin it above your desk.
Jump to Reference · One-page cheat-sheet →Read · Edit · Write · Bash.
Plus Grep · Glob · WebFetch. Everything else is sugar on top.
> open tests/login.spec.ts, find the assertion that checks the toast, and tighten it to verify both text and aria-role // agent will: Read(tests/login.spec.ts) Edit(tests/login.spec.ts) // 1 hunk Bash(npx playwright test login --reporter=line) // reports green ✓
You never call the tools directly — you describe the intent and the agent picks the tool. The cool part: it shows you every tool call before / as it runs, so you stay in the loop.
cat with a brain
Pulls only the slice it needs — line ranges, page ranges, image content.
exact-string swap
Fails loudly if the target isn't unique. Safer than sed.
new files
Whole-file create / overwrite. Used sparingly.
your shell
Runs tests, git, curl, anything. Honours permissions.
The single file that changes everything
Teach the agent your house rules once.
A CLAUDE.md at the repo root is auto-loaded every turn. Put your test framework, your locator policy, your no-flake rules. The agent obeys it.
# QA conventions — qa-portfolio ## Locators — STRICT - Prefer getByRole / getByLabel. - Never use raw .locator('xpath=…'). - Brittle CSS selectors must include a comment why. ## Waits - No page.waitForTimeout() in committed code. - Use auto-waiting + expect-poll only. ## Test data - Generate via faker; never hard-code emails. ## Commits - Conventional Commits. No co-author trailers.
Three levels of memory cascade — global → project → local. Local overrides project, project overrides global. Lowest line wins.
| Scope | Path |
|---|---|
| Global | ~/.claude/CLAUDE.md |
| Project | ./CLAUDE.md |
| Local (gitignored) | ./CLAUDE.local.md |
What goes inside CLAUDE.md
Nine sections every QA repo needs.
Treat CLAUDE.md as your team's pair-programming contract. Write it once, every session obeys it. Run /init to scaffold, then edit by hand. Lives at the repo root.
# QA Conventions · qa-portfolio ## 1. Stack - Framework: Playwright 1.49+ - Test runner: @playwright/test - Lang: TypeScript strict - Node: 20.x · pnpm ## 2. Folder layout - tests/e2e/ · browser specs - tests/api/ · APIRequestContext - tests/fixtures/ · shared fixtures - tests/pom/ · page objects - tests/data/ · faker builders ## 3. Locators — STRICT - Prefer getByRole, getByLabel, getByTestId. - Never raw .locator('xpath=…'). - Brittle CSS must include a "// why" comment. ## 4. Waits - No page.waitForTimeout. - Use auto-wait + expect.poll only. - Retry once, then quarantine. ## 5. Data - Generate with @faker-js/faker. - Never hard-code emails / phones / addresses. - Test users: env-based, not committed. ## 6. Tagging - @smoke @regression @flaky @wip - CI runs @smoke on every PR. ## 7. Reporting - HTML + JSON reporter on CI. - Attach trace + screenshot on retry. ## 8. Commits - Conventional Commits. - No co-author trailers. - No "🤖 Generated with…" footers. ## 9. Do / Don't - DO: ask before deleting any spec. - DO: run the impacted spec after every edit. - DON'T: edit playwright.config without a plan. - DON'T: bump deps without a separate PR.
Why each section matters
StackAgent picks correct imports / matchers.LayoutNew files land in the right folder.LocatorsKills the most common flake source.WaitsNo fixed sleeps survive review.DataNo PII or hard-coded secrets.TaggingCI lanes stay predictable.ReportingBug repros come with evidence.CommitsClean git history.Do/Don'tHard rails on destructive ops.Three scopes load in cascade · ~/.claude/CLAUDE.md (global) → ./CLAUDE.md (project) → ./CLAUDE.local.md (gitignored personal). Lowest line wins.
# in any prompt to append a line to CLAUDE.md live.Hire specialists, not generalists.
A subagent is a separate Claude session spawned for one bounded job. Its output is summarised back — your main context stays clean. Think of them as contractors who clock out when done. Note · the Plan subagent below is the planner specialist — distinct from Plan-mode (⇧Tab toggle, read-only thinking on the main agent).
Explore
Read-only code locator. "Where is X defined? What calls Y?" Fast.
> explore: find every place we click "Add to cart"
Plan
Architect mode. Designs the implementation plan before any code change.
> plan: add a parallel visual regression suite for /pricing
code-reviewer
Audits a diff. One line per finding, severity-tagged. No fluff.
> review the last 3 commits for race conditions
e2e-runner
Owns Playwright. Generates, maintains, quarantines flaky specs.
> e2e: add coverage for password-reset happy path
tdd-guide
Enforces tests-first. Will refuse to write impl before a failing test.
> tdd: implement the new promo-code validator
security-reviewer
OWASP top-10 sweep on the diff. Flags secrets, SSRF, injection, XSS.
> security: review the new /auth/reset endpoint
Two ways to scale your main agent
Subagent vs Agent Team.
Old model · main agent spawns isolated subagents; results flow upward. New model · main agent acts as team lead, all teammates share a task list, communicate peer-to-peer. Pick the model that fits the job.
When to pick which
| Task shape | Pick | Why |
|---|---|---|
| Independent fan-out (review 20 PRs) | Subagents | No shared state needed; isolated runs scale linearly. |
| Coordinated build (scaffold + tests + docs) | Agent Team | Shared task list keeps teammates from duplicating work. |
| One-shot codebase locate | Subagent | Single Explore agent · output summarised back. |
| Long-running migration | Agent Team | Peer-to-peer ack lets agents hand off mid-flight. |
| Bug-bash across surfaces | Agent Team | Shared task list = no two agents reproducing same bug. |
From contractors → task board → engineering team
The three levels of Claude Code agents.
As coordination needs grow, you climb the ladder · isolated Subagents → managed Agent View → collaborative Agent Teams. Each level adds shared state and inter-agent communication. Pick the lowest level that fits the job.
When to pick which level
| You need | Level | Why |
|---|---|---|
| Review a single PR | Subagent (L1) | One bounded ask · contractor you send a brief. |
| Run 8 lint + a11y + perf sweeps in parallel | Agent View (L2) | Need to dispatch + peek progress without losing terminal. |
| Build a feature with spec → backend → frontend → tests | Agent Team (L3) | Tasks depend on each other · shared task list keeps state. |
| Migrate Cypress → Playwright across 12 packages | Agent Team (L3) | Cross-package coordination · agents talk + claim work. |
| "Hunt this flake" | Subagent (L1) | One spec, one fix · zero need for shared state. |
What is a Skill
Skills are playbooks Claude reads on demand.
A Skill is a reusable, versioned collection of instructions, resources, and examples that teaches Claude Code how to complete a specific type of task. Drop one into .claude/skills/<name>/SKILL.md (repo) or ~/.claude/skills/<name>/SKILL.md (personal). Claude auto-discovers and auto-triggers on description match.
Reusable Expertise
Encapsulate proven workflows and best practices once. Every future session uses them.
Consistent Results
Deliver reliable outcomes by following defined steps + context. Same input · same output.
Composable
Use alone or combine with hooks + plugins for powerful, multi-step automations.
Share & Collaborate
Ship skills via git to your team — or release to the community marketplace.
How a Skill runs · 5 steps
You request a task
Describe what you need in Claude Code · plain prompt.
Claude selects a Skill
Most relevant skill chosen by matching its description field.
Skill executes
Claude follows the skill's procedure · uses tools, plugins, files.
Results produced
Structured outputs · code, files, or actions as defined.
Claude responds
Presents results + continues the conversation.
A real skill · flake-hunter
--- name: flake-hunter description: Use when a Playwright spec fails intermittently. Locates the wait, rewrites it to expect-based polling, re-runs 10× to confirm stability. --- ## Steps 1. Read the failing spec. 2. Identify any waitForTimeout / sleep. 3. Replace with expect.poll(). 4. Run npx playwright test --repeat-each=10. 5. Report flake-rate before vs after.
QA-relevant built-ins
Trigger live · type "hunt the flake in checkout.spec.ts". Claude matches against the skill's description field and loads it automatically.
description reads "Use when [condition] · [what it does] · [stop conditions]."Canonical docs · build your own
docs.claude.com · Skills
Full reference · SKILL.md frontmatter, lifecycle, allowed-tools, packaging, distribution.
githubanthropics/skills
First-party skill examples · skill-creator, pdf, xlsx, pptx, docx, content pipelines.
specAgent Skills overview
How skills, agents, hooks, and MCP plug together in the Claude ecosystem.
Build your own skill in under 10 minutes
A Skill is a markdown file with a recipe.
Frontmatter declares name + description. Body holds the procedure. Drop in ~/.claude/skills/<skill>/SKILL.md (global) or .claude/skills/<skill>/SKILL.md (repo-scoped). Claude auto-discovers and triggers on description match.
--- name: flake-hunter description: Use when a Playwright spec fails intermittently or has any sleep / waitForTimeout / networkidle wait. Locates the bad wait, rewrites with expect.poll, re-runs 20x to confirm stability. allowed-tools: Read, Edit, Bash, Grep, Glob --- # Flake Hunter ## When to trigger - Spec failed retry on CI. - User says "this is flaky" / "intermittent". - Code contains waitForTimeout / sleep. ## Procedure 1. Read the spec end-to-end. 2. Grep for: waitForTimeout, sleep, networkidle, hard delays. 3. For each hit, replace with expect.poll or explicit element wait. 4. Run npx playwright test {file} \ --repeat-each=20 --workers=1. 5. Report flake-rate before vs after as a markdown table. 6. If still flaky > 5%, surface the most-likely locator candidate. ## Stop conditions - 20/20 pass · report success. - Any locator looks brittle · ask user.
Frontmatter fields
| Field | Purpose |
|---|---|
| name | Slug · used in /skills. |
| description | The trigger phrase. Be vivid — claude matches on this. |
| allowed-tools | Whitelist of tools the skill may call. |
| model | Optional override · sonnet / opus / haiku. |
| color | Optional · sidebar accent. |
Scaffold faster · use skill-creator
> /skill-creator // or, with the skill name: > use skill-creator to build a skill called "locator-auditor" that scans tests/ for raw xpath and proposes role-based replacements
Reference links
docsdocs.claude.com/en/docs/claude-code/skillsrepogithub · anthropics/skillsspecSKILL.md frontmatter referencetoolskill-creator · scaffolds new skillsguideanthropic-skills · skill-creator (built-in)Don't write skills from scratch · install them
Skill marketplaces · qaskills.sh + skills.sh.
Two community registries let you npx-install pre-built skills into your Claude Code (or Cursor / Copilot / Windsurf). qaskills.sh is QA-only · 450+ testing skills. skills.sh is the broader directory across all domains. Both ship via one command · zero copy-paste.
qaskills.sh — QA Skills Directory
450+ ready-to-install skills for testing — Playwright E2E, Selenium, API generation, security audits, WCAG 2.2 a11y, Jest unit, flake detection, perf, CI optimisation. Browse by category or leaderboard. Works with Claude Code, Cursor, Copilot, Windsurf, Gemini · 26+ agents total.
npx @qaskills/cli add playwright-e2e npx @qaskills/cli add api-test-generator npx @qaskills/cli add flaky-test-detector npx @qaskills/cli add wcag-audit npx @qaskills/cli add selenium-convertergeneral · all domains
skills.sh — Agent Skills Directory
Searchable marketplace of agent skills across React, Next.js, Design, Mobile, Databases, Testing, Marketing. Browse trending / official / security tracks. Install any owner/repo with one command.
npx skillsadd anthropic/frontend-design npx skillsadd vercel/react-best-practices npx skillsadd microsoft/azure-ai npx skillsadd anthropic/skill-creator npx skillsadd <owner/repo> # any GitHub skill
QA-relevant skills worth installing first day
| Skill | Source | What it gives you |
|---|---|---|
| playwright-e2e | qaskills.sh | Best-practice Playwright spec authoring with Page Object Model + fixtures. |
| api-test-generator | qaskills.sh | Generates positive · negative · schema · auth coverage from one curl or OpenAPI URL. |
| flaky-test-detector | qaskills.sh | Hunts waitForTimeout · networkidle · sleeps · rewrites with expect.poll. |
| wcag-audit | qaskills.sh | WCAG 2.2 a11y sweep · pulls axe-core · reports per-rule violations. |
| selenium-to-playwright | qaskills.sh | Mechanical migration helper · POM + waits + selectors translation. |
| perf-baseline | qaskills.sh | k6 / Lighthouse baseline harness · saves perf budget JSON. |
| frontend-design | skills.sh · anthropic | Distinctive UI generation · already used inside this masterclass deck. |
| skill-creator | skills.sh · anthropic | Scaffolds new skills with frontmatter + procedure + examples folder. |
| security-review | skills.sh · official | OWASP top-10 sweep over a diff · same engine as /security-review. |
# 1. land in your test repo cd ~/work/my-qa-repo && claude # 2. paste this prompt — claude installs the lot for you > install these skills into .claude/skills/ and update CLAUDE.md to mention each one is available: - npx @qaskills/cli add playwright-e2e - npx @qaskills/cli add api-test-generator - npx @qaskills/cli add flaky-test-detector - npx @qaskills/cli add wcag-audit - npx skillsadd anthropic/skill-creator - npx skillsadd anthropic/security-review After install, run /skills and confirm all six show up.
Use marketplace skill
Generic, reusable patterns · authoring tests, audits, migrations. Don't reinvent.
Write your own
Company-specific rules, internal framework wrappers, secret locator policy.
Fork an existing one
Marketplace skill ~90% right · clone to .claude/skills/, tweak the description + procedure, commit.
npx @qaskills/cli list shows everything installed in current repo · pair with /skills inside Claude to verify load order.Hooks fire around tool calls.
A hook is a shell command the harness runs at a lifecycle event. Use them to auto-format on save, run a smoke test after every write, or block edits to main.
"hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [{ "type": "command", "command": "npx prettier --write $CLAUDE_FILE_PATH" }] } ], "SessionStart": [{ "hooks": [{ "type": "command", "command": "echo '🧪 QA mode ready. Run /init.'" }] }] }
| Event | Fires |
|---|---|
| SessionStart | Once when you launch claude. |
| UserPromptSubmit | Every time you press enter. |
| PreToolUse | Before any tool runs (can block). |
| PostToolUse | After each tool call succeeds. |
| Stop | When the assistant finishes a turn. |
QA pattern · PostToolUse on Edit → run the spec that owns the changed file. Feedback loop drops to seconds.
MCP turns Claude into a browser, a Jira, a Notion, a Gmail.
Model Context Protocol is the USB-C of AI tools. A small server exposes verbs (navigate, click, createJiraIssue); Claude calls them like any other tool. For QA, the killer one is the Playwright MCP.
Playwright MCP
Drive Chromium, Firefox, WebKit. Snapshot the a11y tree, click by role, screenshot, network log.
Atlassian MCP
Read/write Jira issues, Confluence pages. File bugs straight from a failed test.
GitHub via gh
PRs, issues, checks, releases. No extra MCP needed — uses gh CLI under the hood.
Notion MCP
Pull the test plan, push the run report. Round-trip in one prompt.
Gmail / Calendar
Draft the release email, find the freeze window, never alt-tab.
Your own MCP
Wrap your internal API. Spec is just JSON-RPC. ~50 lines to start.
Demo · authoring a spec without typing locators
"Open saucedemo.com, log in, add a hoodie, screenshot the cart."
▸ browser_navigate('https://saucedemo.com')
▸ browser_snapshot() · captured a11y tree
▸ browser_fill_form(user='standard_user', pass=•••)
▸ browser_click(role=button, name='Login')
▸ browser_click(role=button, name='Add to cart', within='Sauce Labs Backpack')
▸ browser_click(role=link, name='cart')
▸ browser_take_screenshot(path='cart.png')
▸ Write(tests/cart-add.spec.ts) ✓
done · spec runs green on first try
import { test, expect } from '@playwright/test'; test('guest adds backpack to cart', async ({ page }) => { await page.goto('https://www.saucedemo.com'); await page.getByRole('textbox', { name: /user/i }).fill('standard_user'); await page.getByRole('textbox', { name: /pass/i }).fill('secret_sauce'); await page.getByRole('button', { name: 'Login' }).click(); await page.getByRole('button', { name: /add to cart/i }).first().click(); await page.getByRole('link', { name: /cart/i }).click(); await expect(page.getByText('Sauce Labs Backpack')).toBeVisible(); });
Notice — every locator is getByRole. No CSS, no XPath. The agent reads the a11y tree, so it picks accessible names by default.
API tests from a single curl.
Paste a curl, an OpenAPI URL, or a Postman export. The agent infers the contract and generates a fixture-based suite — schema validation, negative paths, auth variants, the lot.
> generate API tests for this endpoint with positive, negative, schema, and auth coverage. use playwright APIRequestContext, group by describe, log only on fail. curl -X POST https://api.demo.dev/v1/users \ -H 'Authorization: Bearer $T' \ -H 'Content-Type: application/json' \ -d '{"email":"a@b.co","plan":"pro"}'
import { test, expect } from '@playwright/test'; import { z } from 'zod'; const User = z.object({ id: z.string().uuid(), email: z.string().email(), plan: z.enum(['free', 'pro']) }); test.describe('POST /v1/users', () => { test('creates a pro user', async ({ request }) => { const r = await request.post('/v1/users', { data: { email: 'a@b.co', plan: 'pro' } }); expect(r.status()).toBe(201); User.parse(await r.json()); }); test('rejects bad email', async ({ request }) => { const r = await request.post('/v1/users', { data: { email: 'not-an-email', plan: 'pro' } }); expect(r.status()).toBe(400); }); });
Tests from requirements
A Jira ticket in. A test plan out.
> fetch QA-482 from jira, read the acceptance criteria, produce: 1) a Gherkin scenarios file, 2) a Playwright skeleton, 3) a coverage matrix mapping each AC to a test id.
Feature: Guest checkout Scenario: Apply valid promo Given I have a backpack in my cart When I apply promo "SAVE10" Then the subtotal drops by 10% And the promo chip shows "-10%" Scenario: Reject expired promo Given I have any item in my cart When I apply promo "BLACKFRIDAY23" Then I see error "This code has expired" Scenario Outline: Country-specific tax Given I check out fromThen tax shows % Examples: | country | tax | | IN | 18 | | US-CA | 9 | | DE | 19 |
From a screenshot to a filed bug.
> [screenshot.png attached] user reports the price chip overflows on the pricing card on mobile. reproduce in Chromium @ 390×844, capture a trace, attach screenshot, file a Jira bug in QA project with steps + expected/actual.
The agent reads the image, drives a 390×844 viewport, captures a Playwright trace, posts everything to Jira as a single bundled bug. You sip coffee.
▸ identified component: PricingCard / .price__chip
▸ browser_resize(390, 844)
▸ browser_navigate('/pricing')
▸ overflow confirmed → trace.zip (1.2 MB)
▸ createJiraIssue(QA, type=Bug)
→ QA-941 "Pricing chip overflow @ 390×844"
steps · expected · actual · screenshot · trace
severity: Minor · component: pricing-card
done
Drag a screenshot. Get tests.
Claude is multimodal — it reads images natively. Paste a Figma export, a customer's broken-UI screenshot, a flaky CI run's failure image. It identifies the component, locates it in code, proposes a fix or a test.
Figma → test
Drop a Figma frame. Get a visual-regression spec that snapshots the matching live route.
User report → repro
Paste customer screenshot. Claude infers viewport, route, OS hints, and reproduces.
CI failure → root cause
Drop a failed-snapshot diff. Claude reads both, explains the visual delta in plain English.
QA reviews code too. Now they have leverage.
| Command | Outcome |
|---|---|
| /review | Severity-tagged findings on the current diff. |
| /security-review | Auth, SSRF, injection, secret leakage. |
| /ultrareview | Multi-agent cloud review of the branch / PR. |
| /caveman-review | One-line-per-finding terse review. |
Each finding follows the format path:line: severity: problem · fix. Easy to triage, easy to paste into a PR comment.
tests/login.spec.ts:14: ⚠ medium: page.waitForTimeout(2000) introduces fixed sleep. fix: use expect.poll() on the dashboard heading. src/auth/middleware.ts:42: 🔴 high: token comparison uses ==, vulnerable to timing attack. fix: crypto.timingSafeEqual. playwright.config.ts:8: ◇ low: retries: 3 hides flakes. fix: cap at 1 and quarantine instead.
Run Claude in your pipeline.
The same CLI is headless. Put it on GitHub Actions to auto-triage failing tests, generate PR review comments, or open a Jira when a smoke fails on main.
name: qa-bot on: { pull_request: { types: [opened, synchronize] } } jobs: triage: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install Playwright run: npx playwright install --with-deps - name: Claude review + smoke env: { ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} } run: | npx -y @anthropic-ai/claude-code -p \ "review this PR for QA risk, then run npx playwright test --grep @smoke and post a summary as a PR comment"
The -p flag (print/headless) gives you a one-shot run that exits with stdout. Perfect for cron, CI, or wrapping in your own scripts.
Cost guardrails · keep CI bills bounded
Per-job ceiling
Wrap claude -p with a timeout (timeout 600s) and a token cap via CLAUDE_MAX_TOKENS. Fail-fast beats infinite spend.
Pick the cheapest model that works
Use Haiku for triage / log-summary jobs. Reserve Sonnet for review jobs. Avoid Opus on every PR.
Diff-only context
Pipe git diff origin/main instead of the full repo. 90% of review jobs need only the diff plus 2-3 related files.
Prompt caching on
Set ENABLE_PROMPT_CACHING_1H=1. Repeated reviewer prompts hit the 1h cache and bill at <10% of full price.
Batch · don't fan out
One Claude job reviewing 5 PRs in sequence beats 5 jobs reviewing 1 PR each — shared system prompt cached.
Spend alerts
Anthropic console · set a monthly soft + hard limit. Slack webhook on threshold = no surprise invoice.
timeout 600s npx -y @anthropic-ai/claude-code -p \ --model claude-haiku-4 \ --max-turns 6 \ "review the changes in this PR for QA risk only. focus on: flaky waits, hard-coded data, missing assertions. skip style nits. cap report at 8 bullets." < $(git diff origin/main)
Plan first. Then let it loose.
Read-only thinking
Toggle with ⇧Tab. Claude reads, searches, designs — but cannot Edit / Write / shell-mutate. Approve the plan, then exit and execute.
large refactors · scary migrations · unfamiliar codebases · auth changes
Isolated sandbox
Spin a git worktree on a temp branch. Agent works there. If it goes sideways, you delete the dir. Your main checkout never moves.
> work on this in a worktree so my dev server keeps running
Together these are the two safety belts that let you run the agent autonomously on a long task while you go to lunch. Come back to a green PR.
Things that will save you a workday.
@filenameReference a file inline. Tab-completes paths.#Quick-add a CLAUDE.md note from the prompt.⇧TabToggle Plan ↔ Auto-Accept Edits mode.EscCancel current tool call without killing session.Esc EscRewind — pick an older message and branch./compactHand-roll a context summary when token budget tight.!cmdShell passthrough — no agent involvement./costSee spend before continuing the next sweep./resumePick up yesterday's session by id.--continueResume the last session non-interactively.--dangerously-skip-permissionsYOLO mode for sandboxed containers only.claude -p "…"One-shot run, prints answer, exits. CI-friendly.Make your own /commands
Your team. Your verbs.
Drop a markdown file in .claude/commands/. The filename becomes the command. Inside, write the prompt template — with placeholders. Share it via git so every QA in your team has the same playbook.
--- description: Hunt a flaky Playwright spec argument-hint: [spec path] --- You are debugging a flaky Playwright test. Spec path: $ARGUMENTS Do these in order: 1. Read the spec end-to-end. 2. Find every waitForTimeout, sleep, hard-coded delay, or networkidle wait. 3. Replace with expect.poll / explicit element wait. 4. Run it --repeat-each=20. 5. Report flake rate before vs after as a table. Stop and ask if any locator looks brittle.
> /flaky tests/checkout.spec.ts // claude expands the template, fills // $ARGUMENTS, and runs the playbook
Useful QA commands to author
A QA day · before vs after.
Before
09:00Stand-up. 4 flaky tests overnight.09:30Open Jira. Read AC. Translate to Gherkin by hand.10:30Hunt the locator. Tweak the wait. Re-run. Cry.12:00Lunch · still 3 flakes.14:00Write spec body. Stack-overflow the matcher.16:00Smoke breaks. Bisect commits manually.17:30File bug. Attach screenshot. Update Confluence.18:30Out the door. Backlog grew.After
09:00Stand-up. /triage-failures ran overnight.09:15/gen-tests QA-482 · Gherkin + skeleton in 2 min.09:45/flaky checkout.spec.ts · agent fixes, retries ×20.11:00Review the diff. Approve. PR opened.12:00Lunch · backlog smaller.14:00Exploratory session with Playwright MCP.15:00/bug-from-trace · 3 bugs filed in 5 min.16:00Deep work — architecture, risk, mentoring.17:00Out the door. Slept fine.One session, every job vs one agent, one job
Vibe Coding vs Software Factory.
Pile every role onto one Claude session and you get context chaos — research notes, frontend code, QA assertions, reviewer remarks all bleed into the same buffer. The fix · split roles across small, focused agents in a pipeline. Each agent owns one job, with clean context, and hands off to the next.
What this means for QA
Symptom · main session is doing too much
The same Claude turn just researched the Jira, wrote Playwright tests, reviewed your migration, and drafted the PR description. By turn 30 it's forgotten which framework you're using.
Fix · pipeline with explicit roles
Researcher subagent reads AC → Spec subagent writes Gherkin → Tests subagent writes Playwright specs → Validator subagent runs the suite + reports. Each gets its own context.
> Build a software factory pipeline for QA-482.
Step 1 · researcher subagent · pull AC from Jira, summarise.
Step 2 · spec subagent · turn AC into Gherkin scenarios.
Step 3 · test subagent · write Playwright specs from Gherkin.
Step 4 · validator subagent · run the suite, report pass/fail.
Each step in its own context · pass only the artefact forward.
Stop and ask if any step output looks off.
/compact or split into subagents.When your repo has 12 apps and 8 test suites
Claude Code in a monorepo.
One Claude session can scope multiple packages via /add-dir. Per-package CLAUDE.md overrides the root one. Per-package skills live under packages/<app>/.claude/skills/. Run diff-aware tests so you don't smoke-test the whole world for a 1-file fix.
## Repo · Turborepo monorepo Layout: apps/web/ Next.js app apps/admin/ admin dashboard apps/api/ Express API packages/ui/ shared components packages/qa/ shared Playwright fixtures Each app has its own CLAUDE.md. ALWAYS read the package-level CLAUDE.md before editing inside that package. Diff-aware test command: turbo test --filter='[origin/main]' CI runs only changed packages.
> /add-dir apps/web > /add-dir packages/qa > the new apps/web/PricingCard component reuses fixtures from packages/qa. Read both CLAUDE.md files, then write a Playwright spec at apps/web/tests/pricing.spec.ts that imports the shared checkoutFixture. Run only that one spec: turbo test --filter=web -- --grep=pricing
.claude/skills/ beats root-level — keeps locator policy local.Monorepo cookbook
| Pattern | Recipe |
|---|---|
| Diff-aware CI | turbo test --filter='[origin/main]' · only packages that changed. |
| Per-package skills | packages/web/.claude/skills/locator-auditor/ · stays out of apps/api. |
| Shared fixtures | packages/qa · import via @org/qa-fixtures in every test app. |
| Cross-package refactor | Run Claude with /add-dir on every affected package · one PR, one diff. |
| Selective Playwright projects | playwright.config.ts · one project per app, filter via --project=web. |
Migration playbook · don't rewrite by hand
From Cypress · Selenium · TestCafe → Playwright.
Four-step playbook. Claude does 90% of the mechanical work · you review the diff. Cite the selenium-to-playwright skill from qaskills.sh in your prompt and it follows the official translation map.
Audit
Claude inventories every spec, groups by complexity, flags brittle ones.
Convert
Spec-by-spec mechanical translation · locators, waits, hooks.
Quarantine
Anything Claude isn't 95% sure about lands in @flaky tag for human review.
Cut over
Run both suites on CI for 1 week · drop old when delta = 0.
Locator translation matrix
| Cypress / Selenium | Playwright | Why |
|---|---|---|
cy.get('[data-test=login]') | page.getByTestId('login') | Built-in test-id locator, no CSS lookup. |
cy.contains('Submit').click() | page.getByRole('button', { name: 'Submit' }).click() | Role-based · screen-reader-equivalent. |
cy.intercept('POST', '/api/users') | page.route('/api/users', …) | Native request interception. |
cy.wait('@createUser') | await page.waitForResponse(r => r.url().includes('/users')) | Promise-based · no aliases. |
driver.findElement(By.xpath('//button[…]')) | page.getByRole('button', { name: /…/ }) | Auto-wait + a11y tree. |
Thread.sleep(2000) | await expect(locator).toBeVisible() | Auto-waiting · no sleeps. |
beforeEach hook | test.beforeEach fixture · or named fixture | Fixtures compose better. |
> migrate cypress/e2e/ to Playwright at tests/e2e/. Use the selenium-to-playwright skill from qaskills.sh. Rules: - Use getByRole / getByLabel / getByTestId first. - No waitForTimeout. Replace with expect-based waits. - cy.intercept → page.route. cy.task → fixture method. - Anything you're not 95% sure about, tag @flaky + add a TODO. After conversion, run the new suite locally and report: - total specs converted - pass count / fail count / @flaky count - diff in run-time vs old Cypress suite
Aliased intercepts
Cypress @alias doesn't map cleanly · use waitForResponse with URL matcher instead.
Custom commands
Cypress Cypress.Commands.add → Playwright fixtures · not 1:1. Build a fixture file.
iframe handling
Cypress can't deeply traverse iframes natively; Playwright can via frameLocator. Better outcome.
Numbers that drive QA decisions
Flake rate · MTBF · p95 — let Claude do the math.
Claude can ingest your Playwright JSON reporter output, compute the stability metrics that matter, and rank specs by flake risk. Wire it into CI · fail the build if flake rate goes above 2%.
The four numbers
Failed-then-passed-on-retry %
If > 2% on main, your suite is rotting. Track per-spec and per-package.
Mean time between flakes
Runs between flake events. Useful for noisy specs · alert when it drops.
p95 duration
95th percentile spec runtime. Bisect anything spiking 2× week-over-week.
Pass rate over 30 runs
If < 98% on a non-WIP spec, quarantine or fix · don't ship more on top.
--- description: Read the last 30 Playwright JSON reports, compute flake rate, rank flakiest specs, propose fixes for top 5. argument-hint: [days] --- You have $ARGUMENTS days of JSON reports under .playwright/reports/. For each spec, compute: - total runs - failed-then-passed-on-retry count - flake rate (= retries / runs * 100) - p95 duration Sort by flake rate desc. For the top 5, read the spec file and propose 1-line fixes. Output as a markdown table + a fix list.
# after the playwright job - name: Flake gate run: | rate=$(node scripts/flake-rate.js) echo "flake rate: ${rate}%" if (( $(echo "$rate > 2.0" | bc -l) )); then echo "::error::Flake rate ${rate}% exceeds 2%" exit 1 fi # scripts/flake-rate.js const { suites } = require('./playwright-report.json'); const all = suites.flatMap(s => s.specs); const retries = all.filter(s => s.tests[0].results.length > 1).length; console.log(((retries / all.length) * 100).toFixed(2));
> /report-flake 7
also: produce a 5-bullet exec summary for the QA channel —
start with the headline number, then the biggest mover,
then top-3 specs to fix this week. Format for Slack.
When your test framework has no public docs
Teach Claude your in-house harness.
Three ways to teach Claude a framework it's never seen — pick one or stack all three. Pattern works for any internal CLI / DSL / fixture system.
Document the rules
Locator policy, fixture system, naming conventions in a "## AcmeTest" section of CLAUDE.md. Loaded every turn.
Show input → output pairs
Drop 8-12 example specs in .claude/skills/acmetest/examples/. Claude few-shots from them.
Wrap your CLI
~50 lines of JSON-RPC turns acme-test run --spec X into a real MCP tool Claude can call.
## AcmeTest · our in-house test harness Locators are NEVER raw CSS. Always use: acme.locate('@: Fixtures live in tests/_fixtures/ and are auto-injected · NEVER instantiate manually. Setup hook is beforeScenario, not beforeEach. Assertions use acme.expect(x).toMatchSnapshot(); DO NOT import @playwright/test. CLI: acme-test run --spec=tests/login.acme acme-test list-fixtures acme-test snapshot --update --spec=… Reports land in .acme/reports/<ts>/. Trace files at .acme/traces/<ts>/<spec>.trace.')
Minimal custom MCP wrapper (Node)
import { Server } from '@modelcontextprotocol/sdk/server'; import { exec } from 'node:child_process'; import { promisify } from 'node:util'; const sh = promisify(exec); const server = new Server({ name: 'acme-test', version: '1.0' }); server.tool('acme_run', { description: 'Run an AcmeTest spec and return the JSON report.', inputSchema: { type: 'object', properties: { spec: { type: 'string' } }, required: ['spec'] } }, async ({ spec }) => { const { stdout } = await sh(`acme-test run --spec=${spec} --json`); return { content: [{ type: 'text', text: stdout }] }; }); server.tool('acme_list_fixtures', { description: 'List all available fixtures.' }, async () => ({ content: [{ type: 'text', text: (await sh('acme-test list-fixtures')).stdout }] })); server.start({ transport: 'stdio' });
{
"mcpServers": {
"acme-test": {
"command": "node",
"args": [".claude/mcp/acme-test/server.js"]
}
}
}
Day-by-day playbook for week one
Onboard a new QA to Claude Code in 5 days.
Hand a printable checklist to every new hire. By Friday they ship their first PR. By week two they're authoring custom skills.
| Day | Task | Outcome |
|---|---|---|
| Mon · Setup | npm i -g @anthropic-ai/claude-code · clone the test repo · /init · read root CLAUDE.md end-to-end · install team skills via npx @qaskills/cli add <skill> | Working environment · CLAUDE.md memorised · 6 team skills installed |
| Tue · Read | Pair with a senior on /explore · ask Claude to walk the test directory · run /flaky tests/checkout.spec.ts together · watch the workflow | Understands repo shape · knows where fixtures + POM live |
| Wed · First PR | Pick a Jira ticket from the "good-first-bug" lane · ask Claude to generate Gherkin + Playwright skeleton · finish, run, push · /review before opening the PR | First PR opened with green CI |
| Thu · Tools | Author a personal custom command in .claude/commands/me/<name>.md · try Plan mode (⇧Tab) on a scarier task · run /security-review on yesterday's PR | Knows custom commands · plan mode · security gates |
| Fri · Shadow | Shadow the bug-bash · use Vision (drop screenshots) to repro 3 bugs · file each via Jira MCP · pair on /ultrareview for the senior's PR | Familiar with Vision · Jira MCP · review workflows |
The buddy prompt template
> /btw I'm a new QA on this repo. I'm trying to understand why tests/checkout/payment.spec.ts uses a custom fixture instead of the shared one in packages/qa. Read both files, give me a 4-bullet explanation a junior tester can follow. Do not change any code. # /btw runs the question as a side-thread so the senior's # main session isn't polluted.
"It edited the wrong file"
Forgot to /clear between tasks · old context bled in. Always clear between unrelated scopes.
"My CI bill went up"
Ran --continue on yesterday's huge session. Use /compact instead, or start fresh.
"The agent is hallucinating tests"
No CLAUDE.md rules on locators. Spend 10 min writing them · save 10 hours of fights.
Week-2 KPIs (manager checklist)
✓Authored at least one custom slash command in .claude/commands/✓Shipped 3+ PRs with /review + /security-review green✓Filed 5+ bugs with Vision + Jira MCP · no manual repro steps✓Can explain to a peer what a Skill is and which 3 their team relies on✓Knows when to use plan mode vs auto-accept vs worktreesHands-on · run these now
Five drills. Do them in order.
First contact
Pick any repo. Run claude. Type /init. Read the generated CLAUDE.md. Edit it to reflect your rules.
Codegen a spec
Ask Claude to drive saucedemo.com via Playwright MCP, complete a checkout, and emit a spec. Run it. Expect green on first try.
Tame a flake
Find a spec with a waitForTimeout. Ask Claude to remove all sleeps and prove stability with --repeat-each=20.
Bug from screenshot
Drop a UI bug screenshot. Ask Claude to repro at the right viewport, capture a trace, and draft a Jira-ready bug write-up.
Portfolio scaffold
Run the mega-prompt from the project chapter. End with a working localhost:3000 and a passing Playwright suite.
Author /flaky
Build the custom /flaky command in .claude/commands/. Commit it. Run it on a real spec. Share with a teammate.
Capstone · channels → skill → site → live URL
Your channels become your portfolio.
You already produce QA content across 6 channels. We'll teach Claude to read them, package you as a Skill, generate a portfolio site, and ship it to app.thetestingacademy.com/masterclass/ClaudeCode.html.
Four prompts. Zero hand-coding. Ends with a live URL + Playwright suite + Lighthouse gate + CI pipeline.
Test engineer.
Builder of QA crews.
240k+ engineers learn QA from my YouTube. I break, write, ship, repeat.
Step 1 · package yourself as a Skill
Aggregate 6 channels into one Skill.
The Skill becomes the single source of truth — every later prompt reads from it. Whenever you publish new content, you re-run the Skill and the site updates itself.
> create a new skill at .claude/skills/pramod-me/SKILL.md that packages me as data. Aggregate the following channels and produce a single data.json file in the same folder: - GitHub → https://github.com/promode - LinkedIn → https://www.linkedin.com/in/thetestingacademy - Blog → https://scrolltest.com - YouTube → https://www.youtube.com/@TheTestingAcademy - Instagram → https://instagram.com/thetestingacademy - Medium → https://medium.com/@thetestingacademy For each channel: 1. Fetch the public page with WebFetch. 2. Extract: handle, follower count if visible, top 6 items (repo / post / video / reel / article) with title, url, date. 3. Pull a one-line bio + headshot URL where available. Then in SKILL.md frontmatter: name: pramod-me description: Use whenever the user needs current data about Pramod Dutta — bio, top repos, latest videos, recent talks, newest blog posts — to render a personal site or CV. allowed-tools: WebFetch, Read, Write, Bash Procedure section must: - Read data.json if <= 7 days old, otherwise refresh from sources. - Expose 5 helpers: getBio(), getTopRepos(), getLatestVideos(), getRecentPosts(), getTalks(). - Cache to data.json. Pretty-print, ASCII-only, sorted by date desc. Verify by running the skill end-to-end and pasting a summary of the JSON it produced. Stop if any fetch fails — ask before retrying.
Step 2 · scaffold + content + style
One mega-prompt → full Next.js site.
> using the pramod-me skill, scaffold a Next.js 15 portfolio at ./qa-portfolio. Requirements: STACK - Next 15 (app router), TypeScript strict, Tailwind, MDX. - Inter (sans) + JetBrains Mono. Teal accent #1a7c79. - Light theme, doc-style layout. No purple gradients, no neon. ROUTES / hero · headline · 3 chip tags · CTA "watch on YouTube" /work top 6 repos pulled from getTopRepos() /talks talks + slide decks /writing latest blog + Medium articles via getRecentPosts() /videos embed latest 6 YouTube videos via getLatestVideos() /social LinkedIn / Instagram cards, last 3 posts each /contact email · calendar embed · X / GitHub links COMPONENTS - <ChannelStrip /> at footer · 6 icons → 6 URLs. - <Card /> with title, date, source-badge, hover lift. - <Hero /> · pulls from getBio() at build time. DATA - Read from .claude/skills/pramod-me/data.json. - getStaticProps for /work /talks /writing /videos. - ISR every 24h on prod. SEO + META - OpenGraph image generated dynamically per route. - JSON-LD Person schema on home. - sitemap.xml + robots.txt. After scaffold finishes, run pnpm dev and confirm the home page renders without console errors. Take a screenshot at 390x844 and 1440x900, save them as docs/hero-mobile.png and docs/hero-desktop.png. Then stop and report.
Step 3 · Playwright + Lighthouse + GHA
Cover every route. Gate every merge.
> add a Playwright suite + a CI pipeline to ./qa-portfolio. PLAYWRIGHT - Install @playwright/test + @axe-core/playwright. - 3 projects: mobile (390x844 · Pixel), tablet (768x1024 · iPad), desktop (1440x900 · Chromium). - Suites: tests/smoke/ every route returns 200 + has <h1> tests/a11y/ axe.run() on every route, 0 critical issues tests/visual/ screenshot per route, threshold 0.2 tests/links/ crawl all links from /, fail on 404 tests/seo/ meta description + og:image present per route - Trace on retry, screenshot always, video on failure. - HTML reporter on CI. LIGHTHOUSE - lhci autorun on prod URL after deploy. - Gates: perf 90, a11y 95, best-practices 95, seo 100. - Fail the job below threshold. CI · .github/workflows/qa.yml on: pull_request, push to main jobs: lint · eslint + prettier typecheck · tsc --noEmit e2e · playwright on 3 viewports lighthouse · lhci autorun build · pnpm build matrix the e2e job by project. shard 4 ways. Run the full suite locally. If anything fails, fix it. When green, commit each layer as its own conventional commit.
Step 4 · Vercel + Testing Academy upload
From localhost to thetestingacademy.com.
> deploy ./qa-portfolio to vercel. - run vercel login if needed (hand me the OTP prompt). - project name: pramod-qa - link to repo github.com/promode/qa-portfolio - env vars (encrypted): ANTHROPIC_API_KEY YOUTUBE_API_KEY MEDIUM_RSS_URL - run vercel --prod. - print preview + prod URLs. - run playwright suite against the PROD url; paste HTML report. - finally, attach custom domain: pramod.thetestingacademy.com
> publish this masterclass deck to app.thetestingacademy.com/masterclass /ClaudeCode.html. Steps: - rename index.html → ClaudeCode.html. - SSH/SFTP to the app host using creds from ~/.ssh/tta_deploy. - place under /var/www/app/masterclass/. - chmod 644, chown www-data. - reload nginx. - verify with curl + a Playwright snapshot of the live URL. - commit the deck source to github.com/thetestingacademy/ masterclass-decks on a new branch · open a PR.
| Live URL after this prompt | Owner | Purpose |
|---|---|---|
| pramod.thetestingacademy.com | Vercel | Personal portfolio · auto-rebuild on push. |
| app.thetestingacademy.com/masterclass/ClaudeCode.html | Nginx | This deck · shareable for the class. |
| github.com/promode/qa-portfolio | GitHub | Source · CI gates every PR. |
Live demo · fill the form → see your portfolio appear
Try the portfolio generator right here.
Fill in your channels. Click Generate. A complete single-file portfolio renders in the preview pane on the right. Download it, deploy it. This is the same HTML Claude Code produces in the capstone — only here it runs in your browser so you can preview it instantly.
Identity
Stats
Channels
After download · ship it to Vercel
# 1. install vercel cli once npm i -g vercel # 2. make a folder, drop the file in mkdir my-qa-portfolio && cd my-qa-portfolio mv ~/Downloads/portfolio.html ./index.html # 3. push to vercel · prod vercel --prod # follow prompts → pick project name → done. # you get a https URL in ~20s.
# add a custom domain vercel domains add pramod.thetestingacademy.com vercel alias set <deployment-url> pramod.thetestingacademy.com # add CI: redeploy on every push vercel link git init && git add . && git commit -m "init" gh repo create pramod-qa --public --source=. --push # vercel auto-detects the repo and # rebuilds on every push to main.
Print-friendly reference · pin this above your desk
Claude Code · full cheat sheet.
Every keybinding, slash command, skill knob, hook tag, MCP transport, env var. Lifted from Claude Code v2.1+ release notes. Use ⌘P to print this page; the grid below collapses cleanly to A4.
General controls
Mode switching
Mac · Option as Meta
Input
Prefixes
/Slash command!Direct bash@File mention + autocomplete#Append note to CLAUDE.mdSession
/helpList every command + skill/initCreate CLAUDE.md from repo/clearWipe conversation, keep cwd/compactSummarise history, free context/resumeResume old session by id/statusSession state + token budget/costSpend so far this session/exportSave conversation to file/renameRename current session/btw [q]Side question without polluting context/insightsAnalyze sessions report/extra-usageExtra usage when rate-limited/usagePer-category cost breakdownConfig & tools
/configOpen settings · theme, model, perms/modelSwitch model mid-session/permissionsManage allowed tools / commands/add-dirBring another folder into scope/memoryEdit CLAUDE.md inline/hooksManage hooks/mcpManage MCP servers/agentsManage agent configs/skillsList available skills/keybindingsCustomize keyboard shortcuts/terminal-setupConfigure terminal keybindings/scroll-speedAdjust output scroll speed/ideIDE integrations status/voicePush-to-talk voice dictation/doctorDiagnose installation/upgradeUpdate CLI to latest/release-notesWhat's new in your version/feedbackSubmit feedback (alias /bug)/desktopContinue in Desktop app/login · /logoutAuth handlingReview
/review [PR]Review PR locally/ultrareview [PR#]Cloud multi-agent review/security-reviewScan diff for vulnerabilities/code-reviewEffort levels: low / med / high/pr-commentsPull PR review commentsScheduling & remote
/loop [interval] [prompt]Recurring task--remoteWeb session via claude.aicat file | claude -pPipe input · headless one-shotConfig files
~/.claude/settings.jsonUser settings.claude/settings.jsonProject shared settings.claude/settings.local.jsonLocal only · gitignored~/.claude.jsonOAuth · MCP · state.mcp.jsonProject MCP serversmanaged-settings.d/Drop-in policy fragmentsKey settings
modelOverridesMap model picker → custom IDsautoMode.hard_denyUnconditional auto-mode deny ruleshooks: ifConditional hooks (permission rule syntax)DISABLE_PROMPT_CACHINGWarn at startup if cache disabledMonitor toolStream events from background scriptsPermissionDeniedHook · auto-model denialshowThinkingSummariesOpt-in (off by default)hooks: "defer"Pause headless → resume latertype: "mcp_tool"Hook step invokes MCP toolcontinueOnBlockHook config · keep running after blocked tooldisableSkillShellExecBlock !`cmd`refreshIntervalRe-run custom status N secKey env vars
ANTHROPIC_API_KEYAPI keyANTHROPIC_MODELDefault modelANTHROPIC_BASE_URLProxy / gateway overrideANTHROPIC_BETASAdditional beta headersANTHROPIC_CUSTOM_MODEL_OPTIONCustom /model entryMAX_THINKING_TOKENS0 = offENABLE_PROMPT_CACHING_1HOpt into 1h cache TTLMemory & CLAUDE.md
./CLAUDE.mdProject (team-shared)./CLAUDE.local.mdLocal personal notes · gitignored~/.claude/CLAUDE.mdPersonal · all projects/etc/claude-code/CLAUDE.mdManaged policy (Linux/WSL · org-wide)MCP transport flags
--transport httpRemote HTTP (recommended)--transport stdioLocal process--transport sseRemote SSEBuilt-in skills
Skill toolDiscovers built-in slash commands (/init, /review, /security-review…)/code-reviewCode review · low / med / high effort/batchLarge parallel changes · 5–30 worktrees/debug [desc]Troubleshoot from debug log/loop [interval]Recurring scheduled task/claude-apiLoad API + SDK referenceCustom skill locations
.claude/skills/<name>/Project skills~/.claude/skills/<name>/Personal skillsSkill frontmatter
descriptionAuto-invocation trigger phraseallowed-toolsSkip permission promptsmodelOverride model for skilleffortOverride effort levelpaths: [globs]Path-specific (YAML list)context: forkRun in subagent$ARGUMENTSUser input placeholder${CLAUDE_SKILL_DIR}Skill's own dir${CLAUDE_EFFORT}Current effort level (skill var)!`cmd`Dynamic context injectionplugin bin/Ship executables for Bash toolBuilt-in agents
ExploreRead-only Haiku · locator agentPlanResearch for plan modeGeneralFull tools · complex tasksBashTerminal separate contextAgent frontmatter
permissionModedefault / acceptEdits / plan / dontAsk / bypassPermissionsisolation: worktreeRun in git worktreememory: user|project|localPersistent memory scopebackground: trueBackground taskmaxTurnsLimit agentic turnsinitialPromptAuto-submit first turnSendMessageResume agents (replaces resume)@agent-nameMention named subagentDon't do these. Ever.
Letting the agent run on main
Use worktrees or a feature branch. Never let auto-mode write to your protected branch.
Trusting "I ran the tests" without proof
Ask for the exit code or the report path. Agents can hallucinate green.
One giant prompt for everything
Break the task. Plan first. Execute second. Verify third. Smaller turns = better diffs.
Skipping CLAUDE.md
Without it the agent guesses your conventions and you fight it every turn. Spend 10 min, save 10 hours.
Pasting secrets in prompts
Use env vars. Use 1Password CLI. Never put a token where the agent — or its logs — can see it raw.
Auto-merging Claude's PRs
You read the diff. Always. The agent is the writer; you are the editor.
The new QA toolbelt.
Claude Code
The conductor. Lives in your terminal, reads your repo, fires every other tool.
Playwright
The hands. Drives browsers and APIs. Default for E2E + API.
Playwright MCP
Bridge between agent and runner. Auto-locators via a11y tree.
Atlassian MCP
Jira + Confluence. Read AC, file bugs, post run reports.
GitHub + gh
PRs, checks, releases. Claude calls gh directly — no extra MCP.
Vercel
Preview-per-PR. Run Playwright against the preview before merge.
Trace + Lighthouse
Visual evidence + perf gates. Claude attaches both to every PR.
You
Strategy. Risk. Judgment. The only thing the agent can't replace.
Where to go next.
docs.claude.com / code
Official reference. Always the source of truth for flags + tools.
github · anthropics / claude-code
Issues, recipes, plugin authoring. Star it.
modelcontextprotocol.io
Spec + server registry. Find an MCP for almost anything.
playwright.dev
Auto-waiting, fixtures, traces. Master these and Claude works for you.
The Testing Academy
Newsletter, courses, the community where we keep learning together.
Your repo · tonight
Open a real QA repo. Type claude. Type /init. Begin.