harsh-critic Investigation Protocol

Phase 1 Pre-commitment Predictions

Before reading the work in detail, predict the 3-5 most likely problem areas based on type and domain. This activates deliberate search — instead of passively reading and reacting, the reviewer enters with specific hypotheses to confirm or reject.

Hypothesis-driven Domain-aware 3-5 predictions

Phase 2 Verification

Every technical claim is verified against the actual codebase. No assertion is taken on trust. The approach adapts to the artifact type:

For code

Trace execution paths, especially error paths and edge cases. Check off-by-one errors, race conditions, missing null checks, incorrect type assumptions, security oversights.

For plans

6-step investigation: assumptions extraction, pre-mortem analysis, dependency audit, ambiguity scan, feasibility check, rollback analysis. Devil's advocate challenge on each major decision.

Phase 3 Multi-perspective Review

The work is re-examined from distinct angles adapted to what's being reviewed:

For code

Security Engineer

Trust boundaries, unvalidated input, exploitation vectors

New Hire

Can someone unfamiliar follow this? What context is assumed?

Ops Engineer

Behavior at scale, under load, when dependencies fail. Blast radius.

For plans

Executor

Can I complete each step with only what's written? Where will I get stuck?

Stakeholder

Does this solve the stated problem? Are success criteria meaningful?

Skeptic

What's the strongest argument this approach will fail?

Adaptive Harshness

Starts in THOROUGH mode. Escalates to ADVERSARIAL if serious problems surface during Phases 2–4 — actively hunting for hidden issues and expanding scope.

Any CRITICAL finding 3+ MAJOR findings Systemic pattern

Phase 4 — Core Differentiator Gap Analysis

This is the phase that makes the skill useful. The reviewer explicitly searches for what isn't there — the omissions, the unconsidered, the conveniently skipped:

What would break this?
What edge case isn't handled?
What assumption could be wrong?
What was conveniently left out?

33 gap items found in A/B tests 0 without this phase

4.5

Phase 4.5 Self-Audit

A metacognitive check on all findings before finalizing. Each CRITICAL/MAJOR finding is assessed for confidence level, refutability, and whether it's a genuine flaw vs. a stylistic preference. Low-confidence findings are moved to Open Questions rather than scored sections.

HIGH confidence → scored MEDIUM confidence → scored + flagged LOW confidence → Open Questions

4.75

Phase 4.75 Realist Check

Pragmatic severity calibration for CRITICAL and MAJOR findings. Each high-severity finding is pressure-tested with four questions:

What's the realistic worst case?
What mitigating factors exist?
How quickly would this be detected?
Is momentum bias inflating the severity?

Findings where real-world impact doesn't match the label get downgraded. Every downgrade must include a "Mitigated by: ..." statement explaining the real-world factor that justifies the lower severity. Recalibrations are reported in the Verdict Justification.

Every downgrade requires "Mitigated by: …" Recalibrations reported in verdict

Hard guardrail: Data loss, security breaches, and financial impact findings are never downgraded, regardless of mitigating factors.

Structured Output

Every review produces a report with these exact sections

REJECT

Fundamental flaws. Cannot proceed.

REVISE

Significant rework needed.

ACCEPT-WITH-RESERVATIONS

Proceed with noted concerns.

Clean. Ship it.

Pre-commitment Predictions

Expected vs. found

Critical Findings

Blocks execution. file:line evidence required.

Major Findings

Significant rework. Evidence required.

Minor Findings

Suboptimal but functional.

What's Missing

The core differentiator. Gaps, unhandled edge cases, unstated assumptions. A/B testing: 33 items found with this section vs. 0 without.

Multi-perspective Notes

Security, new-hire, ops concerns.

Ambiguity Risks

Plan reviews only. Multiple valid readings.

Verdict Justification

Why this verdict. Realist Check recalibrations noted here.

Open Questions

Low-confidence findings moved here by Self-Audit.

harsh-critic finds what's wrongand what's missing

Structured Output

harsh-critic finds what's wrong
and what's missing