The verification gap
AI writes the code.
Did anyone verify it?
We measure how much AI writes — lines, percentages, hours saved. We've gone quiet about the one question that decides whether it's safe to ship.
scroll ↓
“It worked in the demo” is where a 2 a.m. incident begins.
The model is confident whether it's right or wrong. The only thing that closes the gap between looks done and is done is you — running the tests, the build, the linter, and reading the diff before you trust it.
Three acts of every change
Agent Karma borrows a frame from the Gita: act with full effort, hold the outcome lightly. A coding session has the same three beats.
Dharma — intent
Say what you're actually trying to do, clearly, before the AI starts. A clear intent is a check you can verify against.
Karma — the action
The AI acts; you guide and validate. This is the work that counts — the tests you run, not the lines you accept.
Phal — the outcome
The result is a reflection, not a grade. Did you verify before you trusted? The score is honest because it's built only from what you actually ran.
कर्मण्येवाधिकारस्ते मा फलेषु कदाचन।
karmaṇy-evādhikāras te mā phaleṣu kadācana
“You have a right to your actions, never to the fruits of action.” — Bhagavad Gita 2.47
You can't control whether the AI's code is correct. You can control whether you checked. Agent Karma scores the part that's yours — the verifying.
How the score works — try it
The Karma Score (0–100) is objective: every point maps to a real validation you ran. Toggle the actions below — this is the actual engine from the extension, scoring live.
Toggle what you actually did. The score is computed by the real engine, live.
- ✓ Tests run 25/25
- ✓ Tests passed — observed 10/10
- ✓ Build / type-check ran clean 20/20
- · Lint ran clean 0/15
- · Test added/updated alongside code 0/15
- ✓ Change measured 5/5
- ✓ Prompt hygiene hint 0.6000000000000001/10
Notice: a test that ran but failed still earns “tests run” — you did the verifying. It just doesn't earn the “passed” bonus. The engine never rewards a check you skipped, so you can't score clean by never testing.