How It Works

Test quality
over quantity.

Germanium connects a formal model of your entire system with an adaptive test engine that learns which strategies find bugs. The result is comprehensive coverage at a fraction of the execution cost, played back with actionable results.

Component 01

Documentation first.

Most testing tools assume you know your system well. In practice, architecture drifts from documentation. Developers hold critical knowledge informally. New team members test from incomplete mental models.

Guided workflows, code analysis, or existing diagrams build a structured, machine-readable model of your system with component boundaries, interaction patterns, data contracts, and expected behaviors.

As teams adopt documentation-first best practices, these artifacts become the foundation of precise characterization tests.

Gherkin behavioral specs — Given/When/Then scenarios in plain language
API data contracts — types, required fields, valid ranges, enum values
Architecture diagrams — exports to PlantUML, Mermaid, Jira, Lucid
Baseline test suites — coverage of documented paths from day one
Code inference — for existing codebases with minimal documentation

checkout_service.feature generated

Feature: Checkout payment processing

  Scenario: OAuth user, null email
    Given a user authenticated via OAuth
    And the email field is null
    When payment validation runs
    Then return 422 with structured error
    And no unhandled exception occurs

# discovered by adaptive test run
# added to spec automatically

payment.contract.json

user_id	string	required
email	string	nullable ⚠
payment_method	enum	required
amount_cents	integer	required · min: 1
promo_code	string	optional

Component 02

Explore intelligently.

A 10-field API endpoint with 3 possible states per field produces 60,000 input combinations before accounting for optional fields, type coercions, and cross-field dependencies. Exhaustive testing is a fantasy.

Component 2 uses the system model to explore broadly before concentrating effort where bugs surface, deprioritizing bug-free zones. The engine runs three phases tuned to spend budget where it counts:

Initial exploration
Adaptive optimization
Final verification pass

The output is deduplicated, avoiding a thousand variants of the same crash. Users get one clear finding per root cause, a minimal reproduction case, and a plain-English explanation.

Bayesian optimization — learns which strategies find bugs in your specific codebase
Language-agnostic — tests at the API contract level, not the implementation
Recency weighting — new and recently modified code gets heavier coverage
Failure clustering — one report per root cause, not per variant
Full reproducibility — every failure captured with complete state for replay

budget allocation — adaptive phase learning

null combinations

82%

boundary values

71%

cross-field deps

54%

enum edge cases

38%

happy path

12%

random fuzzing

↑ Budget dynamically reallocated as results come in

finding #3 of 4

CRASH · auth_service.py:247

OAuth users without a fallback email address cause an unhandled AttributeError during payment validation. Occurs across all payment methods when email is null and auth_provider is 'google'.

Minimal reproduction: 2 fields · Traditional tests: 0/147 caught this

The feedback loop

The system builds upon itself.

The test engine updates the spec for each edge case it discovers. In turn, updated specs produce more precise tests next time. The cycle is continuous, not linear.

Over time, the model accumulates knowledge about which bug patterns appear across systems. New code receives more scrutiny while stable code gets regression protection. Testing improves without requiring more effort from your team.

01 System model captures current architecture and contracts

02 Test engine explores using the model as a precision guide

03 Edge cases that fall outside the spec are surfaced and flagged

04 Spec is updated with new findings — loop repeats

Under the hood

Built for production engineering teams.

The numbers are specific because the design is specific.

Phase 1 budget — initial exploration

15%

Uniform sampling to bootstrap the model before adaptive optimization takes over. Prevents cold-start bias toward incorrect assumptions.

Language support

Any.

Tests at the API contract level, not the implementation. Python, Java, Go, Node, Rust — or a polyglot microservice architecture. It doesn't matter.

Phase 2 budget — adaptive optimization

75%

Thompson Sampling dynamically allocates across strategies based on observed bug discovery rates. The majority of execution time, used intelligently.

Test execution reduction

40–60%

Fewer test runs than exhaustive approaches, with comparable defect coverage. Bayesian optimization concentrates budget on productive strategies.

Phase 3 budget — verification pass

10%

A directed final pass over critical paths and any gaps left by adaptive exploration. The safety net that catches what bandit optimization could miss.

Reproducibility

Complete.

Every failure is fully reproducible. Inputs, seeds, and bandit decisions are captured so any bug can be recreated on demand during investigation.

Test qualityover quantity.

Documentation first.

Explore intelligently.

The system builds upon itself.

Built for production engineering teams.

Test quality
over quantity.