structured investigation protocol

harsh-critic finds what's wrong
and what's missing

A 5-phase review protocol that forces systematic coverage through pre-commitment predictions, multi-perspective investigation, explicit gap analysis, confidence-gated self-audit, and pragmatic severity calibration.

58.6%
Composite score
7–0
Win–loss record
62.5%
Missing coverage
33 vs 0
Gap items (A/B test)
1
Phase 1 Pre-commitment Predictions

Before reading the work in detail, predict the 3-5 most likely problem areas based on type and domain. This activates deliberate search — instead of passively reading and reacting, the reviewer enters with specific hypotheses to confirm or reject.

Hypothesis-driven Domain-aware 3-5 predictions
2
Phase 2 Verification

Every technical claim is verified against the actual codebase. No assertion is taken on trust. The approach adapts to the artifact type:

For code
Trace execution paths, especially error paths and edge cases. Check off-by-one errors, race conditions, missing null checks, incorrect type assumptions, security oversights.
For plans
6-step investigation: assumptions extraction, pre-mortem analysis, dependency audit, ambiguity scan, feasibility check, rollback analysis. Devil's advocate challenge on each major decision.
3
Phase 3 Multi-perspective Review

The work is re-examined from distinct angles adapted to what's being reviewed:

For code
Security Engineer
Trust boundaries, unvalidated input, exploitation vectors
New Hire
Can someone unfamiliar follow this? What context is assumed?
Ops Engineer
Behavior at scale, under load, when dependencies fail. Blast radius.
For plans
Executor
Can I complete each step with only what's written? Where will I get stuck?
Stakeholder
Does this solve the stated problem? Are success criteria meaningful?
Skeptic
What's the strongest argument this approach will fail?
Adaptive Harshness

Starts in THOROUGH mode. Escalates to ADVERSARIAL if serious problems surface during Phases 2–4 — actively hunting for hidden issues and expanding scope.

Any CRITICAL finding 3+ MAJOR findings Systemic pattern
4
Phase 4 — Core Differentiator Gap Analysis

This is the phase that makes the skill useful. The reviewer explicitly searches for what isn't there — the omissions, the unconsidered, the conveniently skipped:

  • What would break this?
  • What edge case isn't handled?
  • What assumption could be wrong?
  • What was conveniently left out?
33 gap items found in A/B tests 0 without this phase
4.5
Phase 4.5 Self-Audit

A metacognitive check on all findings before finalizing. Each CRITICAL/MAJOR finding is assessed for confidence level, refutability, and whether it's a genuine flaw vs. a stylistic preference. Low-confidence findings are moved to Open Questions rather than scored sections.

HIGH confidence → scored MEDIUM confidence → scored + flagged LOW confidence → Open Questions
4.75
Phase 4.75 Realist Check

Pragmatic severity calibration for CRITICAL and MAJOR findings. Each high-severity finding is pressure-tested with four questions:

  • What's the realistic worst case?
  • What mitigating factors exist?
  • How quickly would this be detected?
  • Is momentum bias inflating the severity?

Findings where real-world impact doesn't match the label get downgraded. Every downgrade must include a "Mitigated by: ..." statement explaining the real-world factor that justifies the lower severity. Recalibrations are reported in the Verdict Justification.

Every downgrade requires "Mitigated by: …" Recalibrations reported in verdict
Hard guardrail: Data loss, security breaches, and financial impact findings are never downgraded, regardless of mitigating factors.
5
Phase 5 Synthesis

Findings are compared against pre-commitment predictions, calibrated for severity, and assembled into a structured verdict. The synthesis explicitly tracks which predictions were confirmed, which were wrong, and what was found that wasn't predicted.

Structured Output

Every review produces a report with these exact sections

REJECT
Fundamental flaws. Cannot proceed.
REVISE
Significant rework needed.
ACCEPT-WITH-RESERVATIONS
Proceed with noted concerns.
ACCEPT
Clean. Ship it.
Pre-commitment Predictions
Expected vs. found
Critical Findings
Blocks execution. file:line evidence required.
Major Findings
Significant rework. Evidence required.
Minor Findings
Suboptimal but functional.
What's Missing
The core differentiator. Gaps, unhandled edge cases, unstated assumptions. A/B testing: 33 items found with this section vs. 0 without.
Multi-perspective Notes
Security, new-hire, ops concerns.
Ambiguity Risks
Plan reviews only. Multiple valid readings.
Verdict Justification
Why this verdict. Realist Check recalibrations noted here.
Open Questions
Low-confidence findings moved here by Self-Audit.