Skip to content

Why We Built This

Every engineering team we talk to has the same question: "We bought Copilot/Cursor/Claude -- why isn't it working as well as the demos?" The answer is almost always the same: the codebase isn't ready. No tests for the AI to validate against. No documentation for the AI to learn from. No CI pipeline to catch the AI's mistakes. No clear architecture for the AI to navigate.

DAF Benchmark exists to make that invisible problem visible. We give you a number -- your Dark Factory Score -- and a concrete list of what to fix. Not vague advice. Specific, prioritized recommendations with estimated impact.

How We Score

We analyze 5 dimensions of AI readiness, weighted by their impact on AI agent effectiveness. Each dimension is scored 0-5 based on concrete signals we detect in your repository -- not opinions, not heuristics, not vibes. Every score maps to specific, observable facts about your codebase.

The weights reflect our experience building production software with AI agents across 23 repositories: Context Readiness and Test Infrastructure are weighted highest (25% each) because they have the most direct impact on whether an AI agent can work effectively. A codebase with great docs but no tests is just as problematic as one with great tests but no docs.

We also classify your repo into a Complexity Tier (1-5) so we can compare fairly. A single-file utility shouldn't be compared against a multi-service platform. Scoring expectations scale with complexity.

Context Readiness

25%

Can AI understand your codebase without a human explaining it?

Test Infrastructure

25%

Can AI validate its own changes?

Architecture Clarity

20%

Can AI navigate your code and understand component boundaries?

Automation Maturity

15%

How much of your workflow runs without human intervention?

AI Integration

15%

Is AI a first-class development partner in your workflow?

Your Code Stays Yours

We never store your source code. Scans run in ephemeral containers that are destroyed after analysis. We store scores, recommendations, and repository metadata (name, language, file count) -- never file contents. For private repos, our GitHub App requests the minimum permissions needed: read-only access to code and metadata, scoped to repos you explicitly authorize. We never write to your repo. We never share your scores without your permission.

The Dark Agent Factory Ecosystem

Built by Woodstock Software

DAF Benchmark is built by Woodstock Software LLC, the team behind the Dark Agent Factory ecosystem. We build tools that help engineering teams work more effectively with AI.