Using SQL Testing Library with Mocksmith

Mocksmith is a complementary library that provides automatic mock data generation for Python classes. When combined with SQL Testing Library, it dramatically reduces the boilerplate code needed to create test data.

Installation

# Install both libraries with required extras
pip install sql-testing-library
pip install mocksmith[mock,pydantic]

# Or as an optional dependency (coming soon)
pip install sql-testing-library[mocksmith]

Quick Comparison

Without Mocksmith (Traditional Approach)

from dataclasses import dataclass
from datetime import date, timedelta
from decimal import Decimal
import random

@dataclass
class Customer:
    id: int
    name: str
    email: str
    credit_limit: Decimal
    created_date: date
    is_active: bool

# Manually create test data
customers = []
for i in range(100):
    customers.append(Customer(
        id=i + 1,
        name=f"Customer {i + 1}",  # Not realistic
        email=f"customer{i + 1}@example.com",  # Generic
        credit_limit=Decimal(str(round(random.uniform(1000, 10000), 2))),
        created_date=date.today() - timedelta(days=random.randint(1, 365)),
        is_active=random.choice([True, False])
    ))

With Mocksmith

from mocksmith import mockable, Varchar, Integer, Money, Date, Boolean

@mockable
@dataclass
class Customer:
    id: Integer()
    name: Varchar(100)  # Generates realistic names
    email: Varchar(255)  # Generates valid emails
    credit_limit: Money(min_value="1000", max_value="10000")
    created_date: Date()
    is_active: Boolean()

# Generate realistic test data in one line!
customers = [Customer.mock() for _ in range(100)]

Varchar Constraints (Mocksmith 3.0.1+)

Mocksmith now supports startswith and endswith constraints for Varchar fields:

from mocksmith import mockable, Varchar, Integer

@mockable
@dataclass
class User:
    user_id: Varchar(50, startswith="user_")  # e.g., "user_12345"
    email: Varchar(100, endswith="@company.com")  # e.g., "john.doe@company.com"

@mockable
@dataclass
class Product:
    sku: Varchar(20, startswith="PROD-", endswith="-US")  # e.g., "PROD-ABC123-US"
    barcode: Varchar(13, startswith="978")  # e.g., "9781234567890"
    serial_number: Varchar(20, startswith="SN-")  # e.g., "SN-2024-ABC-123"

# Generate data with constraints
users = [User.mock() for _ in range(10)]
products = [Product.mock() for _ in range(50)]

Real-World Testing Example

Here’s a complete example testing a customer segmentation query:

from mocksmith import mockable, Varchar, Integer, Money, Date
from sql_testing_library import sql_test, TestCase, BaseMockTable
from pydantic import BaseModel

# Define your data model with Mocksmith
@mockable
@dataclass
class Order:
    order_id: Integer()
    customer_id: Integer()
    order_date: Date()
    total_amount: Money()
    status: Varchar(20)

# Create mock table
class OrdersMockTable(BaseMockTable):
    def get_database_name(self) -> str:
        return "sales_db"

    def get_table_name(self) -> str:
        return "orders"

# Define expected result
class CustomerSegmentResult(BaseModel):
    customer_id: int
    total_spent: Decimal
    order_count: int
    customer_segment: str

# Write your test
@sql_test(
    mock_tables=[
        OrdersMockTable([
            Order.mock(
                customer_id=i % 20 + 1,  # 20 customers
                status="completed" if random.random() > 0.1 else "cancelled"
            ) for i in range(200)  # 200 orders
        ])
    ],
    result_class=CustomerSegmentResult
)
def test_customer_segmentation():
    return TestCase(
        query="""
            SELECT
                customer_id,
                SUM(total_amount) as total_spent,
                COUNT(*) as order_count,
                CASE
                    WHEN SUM(total_amount) > 5000 THEN 'VIP'
                    WHEN SUM(total_amount) > 1000 THEN 'Regular'
                    ELSE 'Basic'
                END as customer_segment
            FROM orders
            WHERE status = 'completed'
            GROUP BY customer_id
            HAVING COUNT(*) > 0
            ORDER BY total_spent DESC
        """,
        default_namespace="sales_db"
    )

# Run the test
results = test_customer_segmentation()

# Mocksmith generated realistic data:
# - Real customer names
# - Valid monetary values
# - Realistic date distributions
# - Proper status values

Advanced Patterns

1. Testing Edge Cases

# Create specific test scenarios
edge_case_orders = [
    # High-value orders
    Order.mock(total_amount=Decimal("10000"), status="completed"),
    # Zero-value orders
    Order.mock(total_amount=Decimal("0"), status="completed"),
    # Cancelled orders
    Order.mock(status="cancelled"),
]

2. Time-Series Testing

from datetime import datetime, timedelta

# Generate orders for specific time periods
black_friday_orders = [
    Order.mock(
        order_date=datetime(2023, 11, 24).date(),
        total_amount=Decimal(str(random.uniform(100, 1000)))
    ) for _ in range(100)
]

3. Hierarchical Data

@mockable
@dataclass
class Employee:
    id: Integer()
    name: Varchar(100)
    manager_id: Optional[Integer()]  # Self-referential
    department: Varchar(50)
    salary: Money()

# Create organizational hierarchy
executives = [Employee.mock(manager_id=None) for _ in range(5)]
managers = [Employee.mock(manager_id=random.choice([e.id for e in executives])) for _ in range(20)]
employees = [Employee.mock(manager_id=random.choice([m.id for m in managers])) for _ in range(100)]

Benefits

Less Code: ~70% reduction in test data setup code
Realistic Data: Automatic generation of realistic names, emails, dates, etc.
Type Safety: SQL-like type annotations ensure data matches database schemas
Flexible: Easy to override specific fields for test scenarios
Maintainable: Changes to data models automatically reflected in tests

Best Practices

Use Mocksmith for bulk data generation, manual creation for specific edge cases
Leverage type constraints (min/max values, string lengths) to match database constraints
Create reusable factory functions for common test scenarios
Combine with SQL Testing Library’s features like CTEs and physical tables modes