Skip to content

http2: reduce per-request overhead on the server path#64265

Open
mcollina wants to merge 4 commits into
nodejs:mainfrom
mcollina:http2-perf-r2
Open

http2: reduce per-request overhead on the server path#64265
mcollina wants to merge 4 commits into
nodejs:mainfrom
mcollina:http2-perf-r2

Conversation

@mcollina

@mcollina mcollina commented Jul 2, 2026

Copy link
Copy Markdown
Member

This PR cuts a number of per-request/per-stream costs on the HTTP/2 server hot path, in two commits.

1. http2: reduce per-request allocations

  • Track 'priority'/'frameError' stream listeners by overriding the EventEmitter methods on Http2Stream instead of subscribing to 'newListener'/'removeListener'. The previous approach made every listener add/remove on every stream emit an extra tracking event (the compat layer alone adds 11 listeners per request).
  • buildNgHeaderString: replace the per-call SafeSet with a lazily allocated array, skip the sensitive-headers map() when there are none (the common case), and skip the HTTP-token regex plus connection-specific-header checks for well-known single-value header names — they are all valid tokens and none of them is connection-specific.
  • Replace per-call closures with shared named handlers in onStreamClose (natural close path), afterShutdown and Http2Stream._destroy.
  • Skip the pendingStreams Set add/delete for streams created with their native handle already available (all server streams).
  • Hoist the per-request onStreamTimeout closure factories in the compat layer, and avoid a once() wrapper allocation per server stream.

2. http2: skip trailers round trip for compat responses

The compat layer always responded with waitForTrailers set, so every response paid for a wantTrailers C++ → JS callback, an empty sendTrailers() submission scheduled through setImmediate(), and an extra empty DATA frame on the wire — even though the vast majority of responses never register trailers.

When the headers are flushed as part of response.end() and no trailers have been registered, there is no further opportunity to add trailers, so waitForTrailers is now skipped. Headers flushed early (writeHead(), write(), flushHeaders()) keep the previous behavior, so trailers can still be added while streaming.

Behavior note for reviewers: trailers added after response.end() are now silently dropped. This matches HTTP/1 response.addTrailers() semantics (docs updated accordingly).

3. http2: avoid per-write closures in kWriteGeneric

Every _write()/_writev() allocated four closures plus an anonymous nextTick callback to coordinate the write callback with the end-of-stream check. Since the stream machinery dispatches at most one write at a time, that state now lives on the stream's kState object with shared named functions. When trailers are pending, the end-of-stream check tick is skipped entirely (the writable side cannot be shut down early anyway). Also pre-initializes the dynamically-added kState fields (shutdownWritableCalled, fd) so hot-path stores no longer transition the object shape.

4. http2: finish empty trailers natively for compat streams

Compat responses that flush headers before end() (writeHead()/write()/flushHeaders()) must keep waitForTrailers, so they paid a wantTrailers C++ → JS callback, an empty sendTrailers() + setImmediate(), and a trailers() call back into C++ on every response. A new internal STREAM_OPTION_AUTO_EMPTY_TRAILERS lets C++ finish the stream itself (same empty DATA + END_STREAM frame, identical wire format) when JS never registered trailers; a later setTrailer() flips the stream back to JS-managed trailers via a new disableAutoTrailers() binding, so streaming trailers work unchanged (regression test added). Compat writeHead()+end(): +5.0% vs the previous commit (47.8k → 50.2k req/s, 8 alternating runs); multi-write streaming ~+1%.

Benchmarks

h2load (-c 4 -m 100, 1 KiB payload, mean of 6 alternating runs):

server main this PR Δ
core API (stream.respond + end) 61.0k req/s 70.7k req/s +15.9%
compat API (res.setHeader + end) 43.7k req/s 50.4k req/s +15.3%

benchmark/compare.js (10 runs):

http2/headers.js nheaders=0 n=1000            **      3.98 %       ±2.79%
http2/headers.js nheaders=10 n=1000          ***      7.11 %       ±3.32%
http2/headers.js nheaders=100 n=1000                  1.67 %       ±1.73%
http2/headers.js nheaders=1000 n=1000                -0.32 %       ±1.26%
http2/compat.js duration=5 benchmarker='h2load' clients=2 streams=100 requests=5000    3.02 %  ±4.95%
http2/write.js  duration=5 benchmarker='h2load' size=100000 length=131072 streams=100  1.66 %  ±8.59%

(compat.js/write.js/simple.js stream a file from fs per request, so they are dominated by file streaming and mostly insensitive to per-request overhead; no regressions.)

mcollina added 2 commits July 2, 2026 19:43
Cut several sources of per-stream/per-request overhead on the hot
path:

- Track 'priority'/'frameError' stream listeners by overriding the
  EventEmitter methods on Http2Stream instead of subscribing to
  'newListener'/'removeListener', which made every listener add and
  remove on every stream emit an extra tracking event.
- Replace the per-call SafeSet and sensitive-header mapping in
  buildNgHeaderString with a lazily allocated array and an
  empty-array fast path, and skip the HTTP token regex and
  connection-specific header checks for well-known single-value
  header names.
- Replace per-call closures with shared named handlers in
  onStreamClose, afterShutdown and Http2Stream._destroy.
- Skip the pendingStreams Set add/delete for streams that are
  created with their native handle already available (all server
  streams).
- Hoist the per-request onStreamTimeout closure factories in the
  compat layer to module-level handlers, and avoid a once() wrapper
  allocation per server stream.

h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating
runs: core API 60.2k -> 69.3k req/s (+15%), compat API 43.6k ->
46.2k req/s (+5.9%).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
The compat layer always responded with waitForTrailers set, so every
response paid for a wantTrailers C++ -> JS callback, an empty
sendTrailers() submission scheduled through setImmediate(), and an
extra empty DATA frame on the wire, even though the vast majority of
responses never register any trailers.

When the headers are flushed as part of response.end() and no
trailers have been registered, there is no further opportunity to
add trailers, so waitForTrailers can be skipped altogether. Headers
flushed early (writeHead, write, flushHeaders) keep the previous
behavior so trailers can still be added while streaming.

Trailers added after response.end() are now silently dropped,
matching the HTTP/1 response.addTrailers() semantics.

Also reuse a shared options object for Http2ServerRequest instances
created without explicit options.

h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating
runs: compat API 43.1k -> 49.9k req/s (+15.7% cumulative vs main).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/http2
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added http2 Issues or PRs related to the http2 subsystem. needs-ci PRs that need a full CI run. labels Jul 2, 2026
mcollina added 2 commits July 2, 2026 20:52
Every _write()/_writev() on an Http2Stream allocated four closures
and an anonymous nextTick callback to coordinate the write callback
with the end-of-stream check. Since the stream machinery dispatches
at most one write at a time, that coordination state can live on the
stream's kState object instead, with shared named functions for the
end check and completion logic.

When trailers are pending the writable side cannot be shut down
early anyway, so the end-of-stream check tick is now skipped
entirely for those writes.

Also pre-initialize the kState fields that used to be added
dynamically (shutdownWritableCalled, fd) so hot-path stores no
longer transition the object shape.

h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating
runs vs main: core API 61.0k -> 70.7k req/s (+15.9% cumulative),
compat API 43.7k -> 50.4k req/s (+15.3% cumulative).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
When the compat layer flushes response headers before the response
is ended (writeHead(), write(), flushHeaders()), it must keep
waitForTrailers so that trailers can still be added while streaming.
As a result, every such response paid for a wantTrailers C++ -> JS
callback, an empty sendTrailers() with its setImmediate(), and a
trailers() call back into C++, even though most responses never
register any trailers.

Introduce STREAM_OPTION_AUTO_EMPTY_TRAILERS: when set and no
trailers have been handed to the native side by the time the final
DATA frame is sent, the stream is finished directly in C++ with the
same empty DATA frame carrying END_STREAM that the JS path would
have produced, without calling into JS at all. The compat layer
enables this mode whenever it responds with waitForTrailers and no
trailers registered yet; a later setTrailer() call flips the stream
back to JS-managed trailers through a new disableAutoTrailers()
binding, so streaming trailers keep working unchanged.

The wire format is identical in all cases.

h2load -c 4 -m 100, 1 KiB payload, mean of 8 alternating runs
against the previous commit: compat writeHead()+end() 47.8k -> 50.2k
req/s (+5.0%); multi-write streaming responses +1%.

Signed-off-by: Matteo Collina <hello@matteocollina.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

http2 Issues or PRs related to the http2 subsystem. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants