feat(run-ops): ClickHouse multi-source replication fan-in + admin ops#4119
feat(run-ops): ClickHouse multi-source replication fan-in + admin ops#4119d-cs wants to merge 4 commits into
Conversation
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
apps/webapp/test/runsReplicationInstance.test.ts (1)
221-425: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winHeavy duplication of org/project/env/PG17-container setup boilerplate across both integration blocks.
Both
replicationContainerTestblocks (lines 227-336 and 347-422) repeat nearly identical org/project/runtimeEnvironment creation,createPostgresContainer/REPLICA IDENTITY FULLsetup, and try/finally teardown. Extracting a shared test fixture helper (e.g., intest/utils/) for "seed org/project/env/run" and "spin up dual PG sources with REPLICA IDENTITY FULL" would reduce this to a few lines per test and cut duplication with the similar helpers inrunsReplicationService.part8.test.ts.apps/webapp/test/runsReplicationService.part8.test.ts (2)
96-144: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
seedFkRows/seedRunfixture-seeding logic is duplicated three times in this file (and again across sibling test files).Each of the three tests re-implements essentially the same organization/project/runtimeEnvironment/taskRun creation logic with minor parameter differences. This is the same pattern duplicated in
runsReplicationInstance.test.tsandrunsReplicationService.part9.test.ts(seedRun). Extracting a shared "seed FK chain" helper into a test util module would reduce this to one implementation reused across all three files here and the sibling test files.Also applies to: 275-323, 443-495
152-167: 🩺 Stability & Availability | 🔵 Trivial | 🏗️ Heavy liftFixed sleeps (
setTimeout(500/2000/3000)) instead of polling risk CI flakiness.All three tests here wait a fixed duration for replication to flush ("Wait for BOTH streams to flush into ClickHouse"), unlike
runsReplicationService.part9.test.ts'swaitForLagHistogram, which polls with a deadline. Under container/CPU contention a fixed 3s sleep can be too short and cause intermittent failures. Consider adopting the same poll-until-condition pattern used in part9 for these ClickHouse assertions.Also applies to: 325-336, 497-507
apps/webapp/test/runsReplicationService.part9.test.ts (1)
11-51: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
makeMetricReadersis explicitly copied frompart4.test.ts.The comment at lines 11-12 acknowledges this is copy-pasted from another test file. Since this helper (and the polling helper
waitForLagHistogram) is now needed in at least two part-test files, consider moving it intotest/utils/tracing.tsalongsidecreateInMemoryMetricsto avoid drift between the two copies.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cbfa6793-3aeb-419f-b19b-0c52a1d7cf8e
📒 Files selected for processing (9)
apps/webapp/app/routes/admin.api.v1.runs-replication.backfill.tsapps/webapp/app/routes/admin.api.v1.runs-replication.status.tsapps/webapp/app/services/runsReplicationGlobal.server.tsapps/webapp/app/services/runsReplicationInstance.server.tsapps/webapp/app/services/runsReplicationService.server.tsapps/webapp/app/v3/services/adminWorker.server.tsapps/webapp/test/runsReplicationInstance.test.tsapps/webapp/test/runsReplicationService.part8.test.tsapps/webapp/test/runsReplicationService.part9.test.ts
515b897 to
cb97148
Compare
5de29cb to
5c2d010
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
c59d9c5 to
d5d7fa1
Compare
8fb6e8a to
4e08dc7
Compare
a1ff262 to
a8068e9
Compare
4e08dc7 to
11dc0b7
Compare
a8068e9 to
0ef3a6b
Compare
11dc0b7 to
a9bc9e6
Compare
0db90f0 to
d5415e8
Compare
a9bc9e6 to
277ecea
Compare
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…comments/test names Remove the internal plan-enumeration labels from runs-replication comments and test names, keeping the behavioral descriptions intact. Comment/label hygiene only; no product logic or test behavior changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, fix test status type - Register the implicit single source with id "legacy" so its leader-lock key matches the id the admin status route probes; otherwise leadership always reads false in the non-split config. - Guard the shutdown-path client.stop() fan-out against re-firing per incoming transaction and add a catch so rejections don't surface as unhandled. - Use the TaskRunStatus type alias (not the const value) for status annotations in the dual-source dedup tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
277ecea to
2bba3b8
Compare
What
Extends the ClickHouse runs-replication service to fan in from multiple Postgres sources (the control-plane DB and the run-ops DB) instead of a single source, plus the admin operations to run and observe it.
services/runsReplicationService.server.ts, newrunsReplicationInstance.server.ts,runsReplicationGlobal.server.ts): factors the replication service into per-source instances and a coordinator so a single ClickHouse target is fed from more than one Postgres source.routes/admin.api.v1.runs-replication.status.ts,admin.api.v1.runs-replication.backfill.ts,v3/services/adminWorker.server.ts): adds a status endpoint reporting per-source replication state and updates the backfill entrypoint for the multi-source shape.Why
PR7 of the run-ops split stack, and the final piece: once run state can live in a separate run-ops DB (earlier PRs), the analytics replication into ClickHouse has to consume both sources so runs remain queryable regardless of residency. Behavior-changing for the replication service internals; the ClickHouse-facing output is unchanged (still one runs stream), and single-source operation is preserved when the split is not enabled.
Tests
New vitest coverage:
runsReplicationInstance.test.ts(per-source instance behavior) andrunsReplicationService.part8/part9suites exercising the multi-source coordinator. Testcontainers-backed (ClickHouse + Postgres); no mocks.Notes
Draft, stacked on #4118 (
runops/pr06-write-path). Review that first; this diff is against it.Server-change / changeset note to be added at stack-assembly time.
🤖 Generated with Claude Code