OPENAI PUB_DATE: 2026.01.23

AUDITABLE LLM CODE REVIEWS: DRC MODE, PROMPT TRANSPARENCY, DRIFT TESTS

Formalize LLM-assisted reviews with a session-level toggle—declare a Design Review Continuity (DRC) Mode to enforce consistent, auditable conversations in ChatG...

Auditable LLM Code Reviews: DRC Mode, Prompt Transparency, Drift Tests

Formalize LLM-assisted reviews with a session-level toggle—declare a Design Review Continuity (DRC) Mode to enforce consistent, auditable conversations in ChatGPT proposal 1 and log full prompt templates/system prompts for transparency Codex prompt transparency 2. For reliability, adopt behavior-based evaluation—track time-based decay, contradictions, and response variance to detect drift and regressions in co-pilot outputs Kruel.ai research thread 3.

  1. Adds: a concrete, user-declared continuity mode pattern for consistent design reviews. 

  2. Adds: emphasis on logging and exposing prompt/system-prompt lineage for auditability. 

  3. Adds: a practical evaluation lens using decay, contradiction, and variance signals for drift detection. 

[ WHY_IT_MATTERS ]
01.

Consistent review modes and prompt lineage make LLM-assisted code reviews auditable and compliant.

02.

Behavioral drift tests catch silent regressions that break quality gates and CI stability.

[ WHAT_TO_TEST ]
  • terminal

    Add a session_mode flag (e.g., DRC_ON) to your LLM client and measure impact on response consistency across review threads.

  • terminal

    Nightly evals: replay fixed prompts and datasets to track time-based decay, contradictions, and variance with alerts on thresholds.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Wrap existing LLM calls to log full prompt/system prompts and introduce a DRC toggle without changing downstream logic.

  • 02.

    Start A/B runs with and without DRC mode and wire drift metrics into existing CI dashboards.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Stand up a prompt registry with versioning and session policies (e.g., DRC required) from day one.

  • 02.

    Build an evaluation harness that records decay/contradiction/variance metrics alongside test artifacts.