Add server-side publishing guide for AI Transport#3227

Open

zknill wants to merge 3 commits intomainfrom

zak/ait-443/server-side-token-streaming-publish

Contributor

zknill commented Feb 25, 2026

New page in AI Transport > Token streaming covering Realtime connections, message ordering guarantees, transient publishing and channel limits, per-connection rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.

I've additionally included a nice single-page web-app here so that you can see / test the AblyConnectionPool code:

test-connection-pool.html

Just open that html page in your browser and include a prod API key


          Add server-side publishing guide for AI Transport

542cf21

New page in AI Transport > Token streaming covering
Realtime connections, message ordering guarantees,
transient publishing and channel limits, per-connection
rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.

zknill requested a review from mschristensen

February 25, 2026 17:34

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c6c051cf-a06f-443a-b60d-fc7760a75883

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch zak/ait-443/server-side-token-streaming-publish

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mschristensen added the review-app label

ably-ci temporarily deployed to ably-docs-zak-ait-443-s-whtjs0

February 27, 2026 17:58

Inactive

mschristensen requested changes

View reviewed changes

Contributor

mschristensen left a comment

Some nice info in here.

I'm not sure about the connection pool implementation specifics here, but I think the abstraction could be useful, I wonder if it's worth just implementing in e.g. ably-js and reviewing the implementation through PR there?

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx Outdated


		## Transient publishing and channel limits <a id="transient"/>

		In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).

Contributor

mschristensen Feb 27, 2026

Suggested change

      
            In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).
          
            In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server publishes to a channel without attaching first, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on the [number of channels per connection](/docs/platform/pricing/limits#connection).
          
            <Aside data-type="note">
          
            The server must attach to the channel in order to subscribe to it. In this case, the SDK client instance will not use transient publishing.
          
            </Aside>

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx Outdated


		All message actions use the same transient publish path, including `publish()` and `appendMessage()`. This means a single connection can publish to thousands of distinct channels without hitting the channel limit. No additional configuration is required. When you call `publish()` or `appendMessage()` on a channel that the client has not explicitly attached to, the SDK handles the transient attachment automatically.

		The constraint to be aware of is the [per-connection inbound message rate](/docs/platform/pricing/limits#connection), not the number of channels.

Contributor

mschristensen Feb 27, 2026

Why is this the case?

Contributor Author

zknill Mar 2, 2026

Expanded this section to explain

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx

+              If you also need to subscribe to channels on the same connection, those subscriptions require explicit attachment and will count toward the channel limit.
+              </Aside>
+              ## Per-connection rate limits <a id="rate-limits"/>

Contributor

mschristensen Feb 27, 2026

I feel that the content in this section belongs in the existing /docs/ai-transport/token-streaming/token-rate-limits, were you aware of that page?

Contributor Author

zknill Mar 2, 2026 •

edited

Loading

were you aware of that page?

yes

Contributor Author

zknill Mar 2, 2026

I've re-reviewed this page, and the token rate limits one and I htink :

this page does a better job of actually explaining the concept
the token rate limits stuff here motivates the next section (client/connection pooling)

I'm not sure what our position is on overlapping content, but I think it's more natural to be able to consume all the things you care about here in one doc. But I appreciate it's not great having the same content in two places in slightly different explanations.

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx Outdated

+              <Code>
+              ```javascript
+              class AblyConnectionPool {

Contributor

mschristensen Feb 27, 2026

I'm wonderin g whether we should call this AblyClientPool. I know there is one connection per client, but there is the concept of a connection inside the client (e.g. the connection state listener client.connection.on etc) so it feels a bit weird to call it the same thing a layer above the client

(We also call it getClient below)

Contributor

mschristensen Feb 27, 2026

If we think an abstraction like this is useful, I wonder if it's worth adding to the SDK

Contributor Author

zknill Mar 2, 2026

I don't mind where we add it. We are not currently planning to ship something like this in [any|all] of the SDKs, so the only place I've got to put it is here. I agree though, that it's sort-of too-big to go in docs, and too-small to feel like it justifies adding to ably-js (which has a heavy client-side lean, given most server side publishing is currently rest)

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx Outdated

+                  newClient.connection.on((stateChange) => {
+                    console.warn(`[Pool conn ${index}] ${stateChange.previous} → ${stateChange.current}`);
+                    if (stateChange.current === 'failed') {
+                      this._replaceConnection(index);

Contributor

mschristensen Feb 27, 2026

If a connection fails, there is probably a network issue, and creating a new instance seems unlikely to recover the situation. (Also, in this case, if the new client's connection enter the failed state as a result, this might overflow the call stack?)

src/pages/docs/ai-transport/token-streaming/server-publishing.mdx Outdated


		When your server handles more concurrent AI response streams than a single connection supports, create additional Realtime clients. Each client uses its own connection with its own message rate budget, so throughput scales linearly with the number of connections.

		Route channels to connections using consistent hashing so that all operations for a given channel always go through the same connection. This preserves [message ordering](#ordering) for each response.

Contributor

mschristensen Feb 27, 2026

The implementation looks like standard modulo hashing, not consistent hashing

Contributor Author

zknill Mar 2, 2026

yeah, nice! fixed


          Address PR review feedback for server publishing guide

f5ac792

Simplify the transient publishing section to clarify when
transient publish is used and why the per-connection inbound
message rate is the binding constraint. Rename AblyConnectionPool
to AblyClientPool for accuracy since the pool manages client
instances. Fix "consistent hashing" terminology to "hash function"
since the implementation uses modulo hashing. Remove the
_replaceConnection method which could cause stack overflow and
is unlikely to recover from network failures.

ably-ci temporarily deployed to ably-docs-zak-ait-443-s-whtjs0

March 2, 2026 16:49

Inactive

Contributor Author

zknill commented Mar 2, 2026

thanks for review @mschristensen , two things left to decide:

What to do about the overlapping content around appendRollupWindow and rate limits?
What to do about the suggestion / though of moving the AblyClientPool into the ably-js sdk?

I don't have strong views on these, I could make arguments either way (but I've pushed as-is because I think the current version of leaving both here is my favourite).

zknill requested a review from mschristensen

March 2, 2026 16:50

Contributor Author

zknill commented Mar 3, 2026

Decided:

the token rate limits thing should go on the token rate limit page
move the connection/client pooling thing into ably-js


          Deduplicate rate limit and pool content

d022ed9

Move detailed rate limit content (rollup tables, code examples,
concurrent streams data) from server-publishing.mdx to
token-rate-limits.mdx where it belongs. Replace the full
AblyClientPool implementation with concise conceptual guidance,
as the pool abstraction will be implemented in ably-js instead.

server-publishing.mdx now links to token-rate-limits.mdx for
rollup configuration details and retains a brief overview section
for each token streaming pattern.

ably-ci deployed to ably-docs-zak-ait-443-s-whtjs0

March 4, 2026 09:49

View deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels