Skip to content

Operator troubleshooting

Most pipeline failures are transient and recover on their own; the rest you fix from the production detail page with the built-in retry/override actions. This page covers the failure modes you’ll actually see and how to clear them.

Recover a failed production
Prefer to read? Open the step-by-step transcript
  1. Open the production’s detail page and find the step with the ❌ Failed badge.
  2. Read the step’s error (code + detail).
  3. Fix the cause if it’s permanent (e.g. a credential, a contract); for transient errors just Retry step.
  4. For exhausted/dead-lettered messages, use the admin DLQ management resubmit after fixing.

Step failures

  • Failed status on a step → click Retry step on the production detail page. Recoverable causes: AI provider 5xx, timeouts, transient SQL errors.
  • Dead-lettered → the step exhausted retries. Investigate the error code in the step’s ValidationErrorDetail, then either retry or override. (Admins resubmit from Settings → DLQ management — see Operations & reliability.)

Approved-with-warnings (image safety/rights)

When an image carries warning flags (low safety score, high copyright risk, attribution required), the asset still proceeds, but you must acknowledge the warnings before publishing via the pre-publish rights-review drawer. See Approval and rights review.

Missing assets

If the per-scene image count is below the brief’s planned scene count, the production can stall later in the pipeline (around the platform-variant step). Check the image step’s output on the production detail page and re-run it if needed.

Provider outages

If an AI provider (OpenAI, ElevenLabs, etc.) is in an outage, the affected steps exhaust retries. JARAI routes around a Failing provider using the model-selection chain where it can; check the Status page for current incidents.

Stuck productions

A production that stalls mid-pipeline (e.g. a message lost to an infrastructure blip) is usually re-driven automatically by the recovery process — give it a few minutes before intervening. If it stays stuck, retry the current step.

Quick reference

SymptomDo this
One step shows FailedRetry step (after fixing the cause if permanent)
Step dead-letteredAdmin: inspect + resubmit via DLQ management
Image warningsAcknowledge in the rights-review drawer before publishing
Many failures, one vendorCheck provider health / Status — the chain should route around it
Production won’t startCheck the account’s budget/allowance and concurrency limits
Production stuckWait for auto-recovery; if persistent, retry the current step

Still stuck? See the FAQ or Contact support.