Skip to content

fix(copilot): make createRunSegment idempotent on stream_id, surface swallowed PG errors#5379

Open
Sg312 wants to merge 1 commit into
devfrom
improvement/402-error-observability
Open

fix(copilot): make createRunSegment idempotent on stream_id, surface swallowed PG errors#5379
Sg312 wants to merge 1 commit into
devfrom
improvement/402-error-observability

Conversation

@Sg312

@Sg312 Sg312 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes the recurring copilot_runs insert failure on the headless copilot path. The headless lifecycle derives copilot_runs.stream_id (UNIQUE) from the message id but generates a fresh run id per attempt, so any client retry of the same message was guaranteed to fail with a duplicate-key error (~26 failing traces/day in prod, surfaced only as an opaque Drizzle "Failed query" span with the real Postgres error swallowed). createRunSegment is now idempotent on stream_id, replays continue under the original run identity, and both call sites log the buried Postgres error cause so any remaining failure mode (e.g. an FK violation) is diagnosable from logs. Also syncs the generated contract mirrors from the companion mothership change (additive only: copilot.auth.rejections metric, auth.reject_reason attribute + enum).

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

  • Touched-module suites pass (25 tests): lib/copilot/async-runs/{repository,lifecycle}.test.ts, lib/copilot/request/lifecycle/{run,start,headless}.test.ts.
  • Full tsc --noEmit clean; biome clean on changed files.
  • Contract mirror diff verified additive-only against the regenerated mothership schemas.
  • Reviewers should focus on: the onConflictDoNothing + fetch-existing semantics in createRunSegment (a replayed message now resolves to the pre-existing segment instead of erroring), and ensureHeadlessRunIdentity returning the existing row's executionId/id on replay.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing — existing suites pass; no new test files (behavior covered by the repository/lifecycle suites)
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Companion PR

Companion: https://github.com/simstudioai/mothership/pull/341

…swallowed PG errors

The headless lifecycle derives copilot_runs.stream_id (UNIQUE) from the
message id but generates a fresh run id per attempt, so any client retry
of the same message was guaranteed to fail the insert with a duplicate-key
error (~26 failing traces/day in prod, logged as an opaque Drizzle
"Failed query" with the real Postgres error swallowed).

- createRunSegment: onConflictDoNothing on stream_id + fetch the existing
  row, so a replay resolves to the original segment instead of erroring.
- ensureHeadlessRunIdentity continues under the existing segment's
  execution/run ids on replay rather than fabricating fresh ones.
- Both catch sites now log the Postgres error cause (constraint/detail)
  that Drizzle buries, so any remaining failure mode (e.g. FK violation)
  is diagnosable from logs.
- Regenerated contract mirrors (metrics-v1, trace-attributes-v1,
  trace-attribute-values-v1) from the companion mothership change:
  copilot.auth.rejections metric + auth.reject_reason attribute/enum
  (additive only).

Companion: mothership stops marking auth rejections as span errors and
adds the rejection counter + Loki log export.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jul 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jul 2, 2026 7:30pm

Request Review

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Too many files changed for review. (147 files found, 100 file limit)

Bypass the limit by tagging @greptile-apps to review.

@github-actions github-actions Bot added the requires-mothership-merge Has a companion PR on the mothership/copilot side — merge in lockstep label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ Cross-repo companion check

One or more companion PRs aren't merged into dev yet. Merging this without them will leave copilot and sim out of sync — merge them in lockstep.

  • simstudioai/mothership#341OPEN, not merged (targets dev) — improvement(observability): track auth rejections properly, ship logs to Loki, f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires-mothership-merge Has a companion PR on the mothership/copilot side — merge in lockstep

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant