Skip to content

Example 4: AI-Driven Publishing Pipeline

Domain: Cascade intelligence case study research — from raw idea to live publication
Stakes: Factual accuracy at public scale, editorial integrity, reputational exposure across 247 published analyses
Context: semantic-cal-case-studies — a real production pipeline where AI agents research, author, validate, compile, and deploy. The last three cases were built end-to-end without a single post-publish fix.

Real, not illustrative. This case study documents constraints, incidents, and architectural decisions from an active production system. All details are from the actual pipeline.

The stack

6D Foraging Methodology scores cascade events across six business dimensions. CAL encodes and validates those scores. RECALL compiles JSON briefs into published HTML. GDD governs the pipeline that connects them.


The Situation

The pipeline takes a cascade idea — a structural business event worth analysing — and produces a live, citable, DOI-archived case study at uc-XXX.stratiqx.com in a single agentic session.

At full speed the arc looks like this:

Scope → Research → Brief (JSON) → Audit (0/0/0) 
  → Generate HTML → Verify facts → Deploy to KV
  → Update index → Update banner → Promote

Each step is AI-executed. The human makes one decision at the end: publish or not.

The instinct is to let it run. The pipeline is fast. The AI cites sources. The HTML compiles. Deploy takes two seconds.

GDD says: what does "correct" mean here, and can you prove it before you ship?

That question was not answered upfront. UC-236 answered it for us.


The Incident (UC-236)

UC-236 shipped with three factual errors:

  • "Kadri scored in Game 1" — the source said Game 4
  • "zero playoff rounds won" — the subject had won two rounds, never advancing past the second
  • A third stat cited correctly from a source that was itself stale

The model had cited sources accurately. The sources existed. The citations matched. The audit passed. The HTML compiled.

None of those constraints caught the errors. Three fix rounds. Post-publish corrections. A gap between what the pipeline said it was checking and what it was actually checking.

This is the failure mode GDD is designed to surface before execution.


The ICR Cycle

FORMALIZE — stated constraints

[1] A case is factually correct if every claim is supported by a cited source.
[2] A case is publishable if it passes the editorial one-sentence test:
    "Does this case analyse a structural business cascade?"
[3] HTML is structurally valid if audit returns 0/0/0 (errors/warnings/info).
[4] CAL syntax is correct if the recall compiler accepts it without errors.
[5] Citations are complete if every cite[N] has a matching source-N div.
[6] Sources are credible if they are Tier 1 (Reuters, Bloomberg, WSJ)
    or Tier 2 (CNBC, TechCrunch, Crunchbase) — Tier 3 flagged for review.
[7] The pipeline is complete when the case is deployed and indexed.

STRESS — what breaks each

[1] breaks if: the model cites a source accurately but misreads it
              the source itself is factually wrong or stale
              a claim appears in a section the model did not verify

[2] breaks if: a cascade thesis is technically about business
              but is political commentary dressed as structural analysis —
              passes the surface test, fails the editorial intent

[4] breaks if: CAL compiles but uses REVIEW instead of SURFACE review ON
              for a prognostic case — syntactically valid, semantically wrong,
              breaking downstream review-window logic

[5] breaks if: a citation is added in revision but its source div is not —
              the link exists, the reference does not — invisible in preview,
              broken in production

[6] breaks if: a Tier 1 source is cited accurately but the source
              reported an unconfirmed figure at time of publication —
              the citation is valid; the fact is not

CHECK — internal consistency

Three conflicts surface:

[CONFLICT A] Statement [1] treats "cited" as equivalent to "correct."
             Stress test on [1] shows they are not.
             A model can cite accurately and still misread — or a cited
             source can be stale or wrong.
             [1] and [6] are different constraints. Previously treated as one.

[CONFLICT B] Statement [4] assumes compiler acceptance = semantic correctness.
             REVIEW and SURFACE review ON both compile.
             Only one is semantically valid for a prognostic case.
             Syntactic validation cannot catch semantic miscategorisation.

[CONFLICT C] Statement [3] validates structure. Statement [5] validates
             citation completeness. Statement [1] assumes factual accuracy.
             All three are distinct checks previously collapsed into one
             review pass — none of them verify the others.

SURFACE — unresolvable residue

ITEM 1: Are AI-cited facts verified against sources, or merely cited?
  Cannot resolve from: audit output, compile result, or citation count
  Requires: an explicit claim-verification step before deploy
  Consequence if skipped: UC-236 — post-publish corrections,
                          credibility damage, reader trust erosion
  Resolution: cal_verify_case (Step 4.5) — built after UC-236

ITEM 2: Is the cascade thesis editorially original?
  Cannot resolve by machine: cross-reference checks surface related cases
                             but cannot judge thesis novelty or framing quality
  Decision required from: author, every case
  This is the designed-in human gate — it cannot be automated

ITEM 3: Is the CAL block semantically correct for its case type?
  Cannot resolve from: compile success alone
  Requires: explicit semantic validation of SURFACE review ON
            for prognostic cases, including trigger and window fields
  Resolution: audit step enforces this; schema check added to audit rules

GATE

Three gates now designed into the pipeline before deploy:

  1. Audit: 0/0/0 — structural, citation, and CAL semantic validation. Nothing proceeds with warnings.
  2. cal_verify_case — explicit fact verification, claim by claim, against cited sources. This is Step 4.5. It was not in the original pipeline. It exists because of UC-236.
  3. Author editorial review — does the thesis earn publication? This gate is human, permanent, and not a formality. It is the designed-in point where machine execution stops and judgment begins.

What Changed

The constraints were not invented. They were already operating — implicitly, inconsistently, in the author's head during manual review. GDD made them explicit, stress-tested them, and found the conflicts.

The result:

  • cal_verify_case is now Step 4.5 in every publishing run — an explicit constraint verification step built directly from the UC-236 incident
  • The audit schema enforces SURFACE review ON for prognostic cases before HTML generation
  • Citation completeness is checked structurally, not visually
  • The human gate is not "does this look right" — it is "does this thesis earn a DOI"

The last three case studies were built end-to-end in a single agentic session each — research, brief, audit, generate, verify, deploy, index, banner, promote — with zero post-publish corrections.

The pipeline did not get faster. The constraint set got explicit.


The Unresolvable Residue

After 247 published cases, one constraint has never been resolved by machine and never will be:

Is this cascade thesis original enough, well-framed enough, and structurally honest enough to publish under a permanent DOI?

Every case study sits in front of that gate. The machine does everything else. That decision is human — not as a formality, but as the designed-in answer to "do we know what correct means, and can we prove it?"

The answer the machine cannot give is the most valuable output the pipeline produces.


Previous example: AI-First Feature →