What is testing-strategy?

This Claude skill provides a comprehensive framework for implementing robust software testing strategies, focusing on the testing pyramid, Test-Driven Development (TDD), and the AAA (Arrange-Act-Assert) pattern. It offers actionable guidance on balancing unit, integration, and end-to-end tests to ensure high code quality, maintainability, and confidence during refactoring. By defining clear coverage targets and CI/CD integration practices, it helps development teams establish a professional quality assurance workflow.

When should I use testing-strategy?

testing-strategy is useful in the following scenarios: • Architecting a balanced testing suite using the Testing Pyramid to optimize for both execution speed and system reliability. • Implementing Test-Driven Development (TDD) to improve software design and catch logic errors before they reach production. • Standardizing test code across a team using the Arrange-Act-Assert (AAA) pattern for better readability and maintenance. • Establishing realistic code coverage targets and metrics for different components like business logic, APIs, and UI. • Integrating automated test suites and coverage reporting into CI/CD pipelines (e.g., GitHub Actions) to prevent regressions. • Applying advanced mocking and fixture strategies to isolate external dependencies like databases and third-party APIs.

name	testing-strategy
description	Comprehensive testing guidance covering test planning, TDD workflow, testing pyramid, and coverage targets. Ensures confidence through layered testing.

Testing Strategy Skill

Core Principle

Confidence through layered testing.

Tests are not just about catching bugs - they enable confident refactoring, document expected behavior, and provide fast feedback during development. A good testing strategy balances thoroughness with maintainability, using the right test type for each validation need.

Effective testing:

Catches bugs before production
Enables safe refactoring
Documents system behavior
Provides fast feedback loops
Scales with codebase growth

Testing Pyramid

The testing pyramid guides test distribution across layers:

        /\
       /  \      E2E (10%)
      /----\     - Full system tests
     /      \    - Slow, brittle, expensive
    /--------\   - Critical user journeys only
   /          \
  /------------\  Integration (20%)
 /              \ - Component interactions
/----------------\- Database, API, services
\----------------/- Medium speed, moderate cost
 \--------------/
  \------------/   Unit (70%)
   \----------/    - Individual functions
    \--------/     - Fast, reliable, cheap
     \------/      - Maximum coverage here
      \----/
       \__/

Unit Tests (70% of tests)

Purpose: Validate individual functions/methods in isolation

Characteristics:

Fast (milliseconds per test)
No external dependencies (databases, APIs, file system)
Isolated (each test independent)
Deterministic (same input = same output)

What to test:

Pure functions (input → output)
Business logic
Edge cases and boundary conditions
Error handling
Data transformations

Example (Python):

def calculate_discount(price: float, discount_percent: float) -> float:
    """Calculate discounted price."""
    if price < 0:
        raise ValueError("Price cannot be negative")
    if not 0 <= discount_percent <= 100:
        raise ValueError("Discount must be between 0 and 100")
    return round(price * (1 - discount_percent / 100), 2)

# Unit tests
def test_calculate_discount_normal_case():
    assert calculate_discount(100.0, 10.0) == 90.0

def test_calculate_discount_zero_discount():
    assert calculate_discount(100.0, 0.0) == 100.0

def test_calculate_discount_full_discount():
    assert calculate_discount(100.0, 100.0) == 0.0

def test_calculate_discount_rounding():
    assert calculate_discount(99.99, 10.0) == 89.99

def test_calculate_discount_negative_price():
    with pytest.raises(ValueError, match="Price cannot be negative"):
        calculate_discount(-10.0, 10.0)

def test_calculate_discount_invalid_percent():
    with pytest.raises(ValueError, match="Discount must be between"):
        calculate_discount(100.0, 150.0)

Integration Tests (20% of tests)

Purpose: Validate interactions between components

Characteristics:

Medium speed (seconds per test)
Real dependencies (test database, external services)
Test realistic scenarios
More complex setup/teardown

What to test:

Database queries and transactions
API endpoint contracts
Service-to-service communication
Cache interactions
File system operations

Example (JavaScript):

describe('User API', () => {
  let db;

  beforeEach(async () => {
    // Setup test database
    db = await createTestDatabase();
    await db.migrate();
  });

  afterEach(async () => {
    await db.cleanup();
  });

  it('creates user and stores in database', async () => {
    const userData = {
      name: 'Jane Doe',
      email: 'jane@example.com',
      role: 'user'
    };

    const response = await request(app)
      .post('/api/users')
      .send(userData)
      .expect(201);

    // Verify API response
    expect(response.body).toMatchObject({
      id: expect.any(String),
      name: 'Jane Doe',
      email: 'jane@example.com',
      role: 'user'
    });

    // Verify database state
    const user = await db.users.findById(response.body.id);
    expect(user.email).toBe('jane@example.com');
  });

  it('rejects duplicate email addresses', async () => {
    await db.users.create({
      name: 'John Doe',
      email: 'jane@example.com'
    });

    const response = await request(app)
      .post('/api/users')
      .send({
        name: 'Jane Doe',
        email: 'jane@example.com'
      })
      .expect(409);

    expect(response.body.error).toMatch(/email already exists/i);
  });
});

End-to-End Tests (10% of tests)

Purpose: Validate complete user workflows through the system

Characteristics:

Slow (minutes per test)
Full system (UI, backend, database, external services)
Brittle (breaks with UI changes)
Expensive to maintain

What to test:

Critical user journeys (signup, checkout, core workflows)
Cross-cutting features (authentication, authorization)
Integration with third-party services
Browser compatibility (if web app)

Example (Playwright):

test('user can complete checkout flow', async ({ page }) => {
  // Login
  await page.goto('/login');
  await page.fill('input[name="email"]', 'test@example.com');
  await page.fill('input[name="password"]', 'password123');
  await page.click('button[type="submit"]');

  // Add item to cart
  await page.goto('/products');
  await page.click('text=Product Name');
  await page.click('button:has-text("Add to Cart")');

  // Proceed to checkout
  await page.click('text=Cart');
  await page.click('text=Checkout');

  // Fill shipping info
  await page.fill('input[name="address"]', '123 Main St');
  await page.fill('input[name="city"]', 'San Francisco');
  await page.fill('input[name="zip"]', '94102');

  // Fill payment info (test mode)
  await page.fill('input[name="cardNumber"]', '4242424242424242');
  await page.fill('input[name="expiry"]', '12/25');
  await page.fill('input[name="cvc"]', '123');

  // Submit order
  await page.click('button:has-text("Place Order")');

  // Verify confirmation
  await expect(page.locator('text=Order Confirmed')).toBeVisible();
  await expect(page.locator('text=Order #')).toBeVisible();
});

Test-Driven Development (TDD)

Red-Green-Refactor workflow:

1. Red (Write Failing Test)

Write the test first, before implementation. Test should fail because functionality doesn't exist yet.

# test_calculator.py
def test_add_two_numbers():
    calculator = Calculator()
    result = calculator.add(2, 3)
    assert result == 5  # FAILS - Calculator doesn't exist yet

2. Green (Make Test Pass)

Write minimal code to make the test pass. Don't worry about perfection yet.

# calculator.py
class Calculator:
    def add(self, a, b):
        return a + b  # PASSES - Simplest implementation

3. Refactor (Improve Code)

Clean up code while keeping tests passing. Tests give you confidence to refactor safely.

# calculator.py (refactored)
class Calculator:
    """Simple calculator for arithmetic operations."""

    def add(self, a: float, b: float) -> float:
        """Add two numbers and return the result."""
        if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
            raise TypeError("Arguments must be numbers")
        return a + b

TDD Benefits

Design first: Writing tests forces you to think about API design
Documentation: Tests document expected behavior
Confidence: Tests enable safe refactoring
Coverage: TDD naturally produces high test coverage
Bug prevention: Catch bugs before they're written

When to Use TDD

Good fit:

Complex business logic
Bug fixes (write failing test, then fix)
API design (tests define the interface)
Critical functionality (payment processing, security)

Poor fit:

Spike/exploratory work (unknown requirements)
Trivial code (getters/setters)
UI experimentation (rapid iteration)
Throwaway prototypes

Test Structure (AAA Pattern)

Arrange-Act-Assert pattern for clear, maintainable tests:

Arrange (Setup)

Prepare test data and system state

Act (Execute)

Run the code being tested

Assert (Verify)

Check the results match expectations

Example:

def test_user_checkout_with_discount():
    # Arrange
    user = User(id=1, membership="premium")
    cart = ShoppingCart()
    cart.add_item(Product(id=101, price=100.0))
    cart.add_item(Product(id=102, price=50.0))
    checkout = CheckoutService(discount_calculator=PremiumDiscount())

    # Act
    total = checkout.calculate_total(user, cart)

    # Assert
    assert total == 135.0  # 150 - 10% premium discount

Test Naming Convention

Format: test_[unit]_[scenario]_[expected_result]

Good names:

test_calculate_discount_with_zero_percent_returns_original_price()
test_user_login_with_invalid_password_raises_authentication_error()
test_get_user_by_id_when_not_found_returns_none()

Bad names:

test_discount()  # What about discount?
test_case_1()    # What is case 1?
test_user()      # What about user?

Coverage Targets

Coverage is a metric, not a goal. 100% coverage doesn't mean bug-free code.

Recommended Coverage by Component Type

Component Type	Target Coverage	Rationale
Business Logic	90-100%	Critical functionality, high bug cost
API Endpoints	80-90%	Public contracts, integration points
Data Models	70-80%	Validation and constraints
Utilities	80-90%	Reused across codebase
UI Components	60-70%	High churn, manual testing viable
Configuration	50-60%	Simple, low complexity

Coverage Metrics

Line coverage: Percentage of code lines executed by tests

Easy to measure, but can be gamed
80% line coverage is reasonable target

Branch coverage: Percentage of code paths executed

Better than line coverage (catches untested conditions)
70% branch coverage is good target

Mutation coverage: Tests that fail when code is mutated

Most rigorous, catches weak assertions
Advanced metric, use for critical code

What NOT to Test

Avoid testing:

Third-party library internals (trust the library)
Framework code (already tested)
Trivial getters/setters (no logic to test)
Private implementation details (test public interface)
Generated code (already validated by generator)

Mocking and Fixtures

When to Mock

Mock external dependencies:

Database connections
API calls to external services
File system operations
Current time (for time-dependent logic)
Random number generation

Don't mock:

Your own domain logic (test it directly)
Simple data structures
Pure functions (no side effects to mock)

Mocking Example (Python)

import pytest
from unittest.mock import Mock, patch

def test_send_welcome_email_calls_email_service():
    # Arrange
    email_service = Mock()
    user_service = UserService(email_service=email_service)
    user = User(email='jane@example.com', name='Jane')

    # Act
    user_service.send_welcome_email(user)

    # Assert
    email_service.send.assert_called_once_with(
        to='jane@example.com',
        subject='Welcome!',
        body='Hi Jane, welcome to our service!'
    )

@patch('user_service.datetime')
def test_user_age_calculation(mock_datetime):
    # Arrange - Fix current time
    mock_datetime.now.return_value = datetime(2024, 1, 1)
    user = User(birth_date=datetime(2000, 1, 1))

    # Act
    age = user.calculate_age()

    # Assert
    assert age == 24

Fixtures (Test Data Setup)

Pytest fixtures:

import pytest

@pytest.fixture
def sample_user():
    """Reusable user instance for tests."""
    return User(
        id=1,
        name='Jane Doe',
        email='jane@example.com',
        role='user'
    )

@pytest.fixture
def db_session():
    """Database session with rollback."""
    session = database.create_session()
    yield session
    session.rollback()
    session.close()

def test_update_user_email(sample_user, db_session):
    # Arrange
    db_session.add(sample_user)

    # Act
    sample_user.update_email('newemail@example.com')
    db_session.commit()

    # Assert
    updated = db_session.query(User).get(1)
    assert updated.email == 'newemail@example.com'

Jest setup/teardown:

describe('User Service', () => {
  let db;
  let userService;

  beforeAll(async () => {
    // Setup once for all tests
    db = await createTestDatabase();
  });

  afterAll(async () => {
    // Cleanup once after all tests
    await db.close();
  });

  beforeEach(() => {
    // Setup before each test
    userService = new UserService(db);
  });

  afterEach(async () => {
    // Cleanup after each test
    await db.clearAllTables();
  });

  test('creates user successfully', async () => {
    const user = await userService.create({
      name: 'Jane Doe',
      email: 'jane@example.com'
    });

    expect(user.id).toBeDefined();
  });
});

Language/Framework Specifics

Python (pytest)

Installation:

pip install pytest pytest-cov

Running tests:

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_user.py

# Run specific test
pytest tests/test_user.py::test_create_user

# Run with verbose output
pytest -v

# Run with debugging (stop on failure)
pytest -x --pdb

Configuration (pytest.ini):

[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = --strict-markers --cov=src --cov-report=term-missing

Key features:

Parametrized tests
Fixtures for setup/teardown
Markers for test categories
Coverage reporting

Example:

import pytest

@pytest.mark.parametrize("price,discount,expected", [
    (100.0, 10.0, 90.0),
    (100.0, 0.0, 100.0),
    (100.0, 100.0, 0.0),
    (99.99, 10.0, 89.99),
])
def test_calculate_discount(price, discount, expected):
    result = calculate_discount(price, discount)
    assert result == expected

JavaScript/TypeScript (Jest)

Installation:

npm install --save-dev jest @types/jest

Running tests:

# Run all tests
npm test

# Run with coverage
npm test -- --coverage

# Run specific test file
npm test user.test.js

# Run in watch mode
npm test -- --watch

# Run with verbose output
npm test -- --verbose

Configuration (jest.config.js):

module.exports = {
  testEnvironment: 'node',
  coverageDirectory: 'coverage',
  collectCoverageFrom: [
    'src/**/*.{js,ts}',
    '!src/**/*.d.ts',
  ],
  coverageThresholds: {
    global: {
      branches: 70,
      functions: 80,
      lines: 80,
      statements: 80,
    },
  },
};

Key features:

Snapshot testing
Mocking built-in
Parallel test execution
Code coverage reporting

Example:

describe('calculateDiscount', () => {
  it.each([
    [100.0, 10.0, 90.0],
    [100.0, 0.0, 100.0],
    [100.0, 100.0, 0.0],
    [99.99, 10.0, 89.99],
  ])('calculates %d with %d%% discount as %d', (price, discount, expected) => {
    expect(calculateDiscount(price, discount)).toBe(expected);
  });

  it('throws error for negative price', () => {
    expect(() => calculateDiscount(-10, 10)).toThrow('Price cannot be negative');
  });
});

Rust (cargo test)

Running tests:

# Run all tests
cargo test

# Run with output visible
cargo test -- --nocapture

# Run specific test
cargo test test_calculate_discount

# Run with coverage (requires tarpaulin)
cargo tarpaulin --out Html

Key features:

Tests in same file as code
Documentation tests
Integration tests in tests/ directory
Built-in benchmarking

Example:

pub fn calculate_discount(price: f64, discount_percent: f64) -> Result<f64, String> {
    if price < 0.0 {
        return Err("Price cannot be negative".to_string());
    }
    if !(0.0..=100.0).contains(&discount_percent) {
        return Err("Discount must be between 0 and 100".to_string());
    }
    Ok((price * (1.0 - discount_percent / 100.0) * 100.0).round() / 100.0)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_calculate_discount_normal_case() {
        assert_eq!(calculate_discount(100.0, 10.0).unwrap(), 90.0);
    }

    #[test]
    fn test_calculate_discount_negative_price() {
        assert!(calculate_discount(-10.0, 10.0).is_err());
    }

    #[test]
    #[should_panic(expected = "Discount must be between")]
    fn test_calculate_discount_invalid_percent() {
        calculate_discount(100.0, 150.0).unwrap();
    }
}

Integration with CI/CD

Automated testing in continuous integration:

GitHub Actions Example

name: Test Suite

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest pytest-cov

    - name: Run tests
      run: |
        pytest --cov=src --cov-report=xml --cov-report=term

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./coverage.xml

    - name: Check coverage threshold
      run: |
        pytest --cov=src --cov-fail-under=80

Pre-commit Hooks

Run tests before allowing commits:

# .git/hooks/pre-commit
#!/bin/bash

echo "Running tests..."
npm test

if [ $? -ne 0 ]; then
  echo "Tests failed. Commit aborted."
  exit 1
fi

echo "Tests passed. Proceeding with commit."

Pull Request Requirements

Enforce testing in PR reviews:

All tests must pass before merge
Coverage must not decrease
New code must have tests
Integration tests for new features

Quality Checklist

Before considering testing complete:

Test Coverage

Business logic has 90%+ coverage
API endpoints have integration tests
Edge cases are tested (zero, negative, null, empty)
Error conditions are tested
Happy path and sad path both covered

Test Quality

Tests follow AAA pattern (Arrange-Act-Assert)
Test names describe scenario and expected result
Tests are independent (order doesn't matter)
Tests use appropriate assertions (specific, not generic)
Mocks are used appropriately (external deps only)

Test Maintainability

No code duplication (use fixtures/helpers)
Tests are fast (unit tests < 100ms each)
Tests are deterministic (no flaky tests)
Test data is realistic but minimal
Tests don't test implementation details

Integration with Workflow

Tests run in CI/CD pipeline
Coverage reported and tracked
Tests run before commits (pre-commit hook)
Failing tests block merge
Coverage thresholds enforced

Anti-Patterns

What NOT to Do

DON'T write tests that test the framework:

# Bad - testing pytest itself
def test_assert_works():
    assert 1 == 1

DON'T test implementation details:

# Bad - tests internal private method
def test_internal_validation():
    user_service = UserService()
    assert user_service._validate_email('test@example.com') == True

# Good - tests public behavior
def test_create_user_with_invalid_email():
    user_service = UserService()
    with pytest.raises(ValidationError):
        user_service.create_user(email='invalid')

DON'T have tests that depend on each other:

# Bad - test order matters
def test_create_user():
    global created_user
    created_user = create_user('Jane')
    assert created_user.name == 'Jane'

def test_update_user():
    created_user.name = 'John'  # Depends on previous test!
    assert created_user.name == 'John'

# Good - each test independent
def test_create_user():
    user = create_user('Jane')
    assert user.name == 'Jane'

def test_update_user():
    user = create_user('Jane')  # Setup within test
    user.name = 'John'
    assert user.name == 'John'

DON'T ignore flaky tests:

# Bad - flaky test that sometimes passes
def test_async_operation():
    start_async_job()
    time.sleep(1)  # Hope it finishes in 1 second
    assert job_completed()

# Good - deterministic test
def test_async_operation():
    start_async_job()
    wait_for_condition(lambda: job_completed(), timeout=5)
    assert job_completed()

DON'T have tests that are slower than necessary:

# Bad - integration test for pure function
def test_calculate_total():
    db = create_database_connection()
    result = calculate_total(100, 0.1)  # Doesn't need DB!
    assert result == 110

# Good - unit test with no dependencies
def test_calculate_total():
    result = calculate_total(100, 0.1)
    assert result == 110

DON'T use generic assertions:

# Bad - generic assertion, unhelpful failure message
def test_user_creation():
    user = create_user('Jane', 'jane@example.com')
    assert user  # What exactly failed if this breaks?

# Good - specific assertions
def test_user_creation():
    user = create_user('Jane', 'jane@example.com')
    assert user.name == 'Jane'
    assert user.email == 'jane@example.com'
    assert user.id is not None

Integration with Backlog Workflow

Testing integrates with task execution:

During Task Planning

Include test requirements in acceptance criteria
Example: "Feature X implemented with 80%+ test coverage"

During Task Execution

Write tests alongside code (TDD or test-as-you-go)
Run tests frequently during development
Commit tests with implementation code

During Code Review

Review test quality (see code-review skill)
Verify coverage for new code
Check test names and structure
Ensure tests actually test the right things

Task Completion

All tests passing is part of "done"
Coverage meets project standards
No skipped/disabled tests without justification

Example acceptance criteria:

Task: Implement user authentication

Acceptance Criteria:
- [ ] Login endpoint returns JWT on success
- [ ] Login endpoint returns 401 for invalid credentials
- [ ] JWT contains user ID and role
- [ ] Tests:
  - [ ] Unit tests for token generation (90%+ coverage)
  - [ ] Integration tests for login endpoint
  - [ ] E2E test for login flow
  - [ ] Test coverage overall 80%+
- [ ] All tests passing in CI

Remember: Tests are an investment in quality and confidence. Good tests enable rapid development by catching bugs early and enabling safe refactoring. Follow the testing pyramid, write clear tests using AAA pattern, and integrate testing into your development workflow from day one.

When & Why to Use This Skill

Use Cases

Testing Strategy Skill

Core Principle

Testing Pyramid

Unit Tests (70% of tests)

Integration Tests (20% of tests)

End-to-End Tests (10% of tests)

Test-Driven Development (TDD)

1. Red (Write Failing Test)

2. Green (Make Test Pass)

3. Refactor (Improve Code)

TDD Benefits

When to Use TDD

Test Structure (AAA Pattern)

Arrange (Setup)

Act (Execute)

Assert (Verify)

Test Naming Convention

Coverage Targets

Recommended Coverage by Component Type

Coverage Metrics

What NOT to Test

Mocking and Fixtures

When to Mock

Mocking Example (Python)

Fixtures (Test Data Setup)

Language/Framework Specifics

Python (pytest)

JavaScript/TypeScript (Jest)

Rust (cargo test)

Integration with CI/CD

GitHub Actions Example

Pre-commit Hooks

Pull Request Requirements

Quality Checklist

Test Coverage

Test Quality

Test Maintainability

Integration with Workflow

Anti-Patterns

What NOT to Do

Integration with Backlog Workflow

During Task Planning

During Task Execution

During Code Review

Task Completion