Skip to content

Add server-side publishing guide for AI Transport#3227

Open
zknill wants to merge 3 commits intomainfrom
zak/ait-443/server-side-token-streaming-publish
Open

Add server-side publishing guide for AI Transport#3227
zknill wants to merge 3 commits intomainfrom
zak/ait-443/server-side-token-streaming-publish

Conversation

@zknill
Copy link
Contributor

@zknill zknill commented Feb 25, 2026

New page in AI Transport > Token streaming covering Realtime connections, message ordering guarantees, transient publishing and channel limits, per-connection rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.


I've additionally included a nice single-page web-app here so that you can see / test the AblyConnectionPool code:

test-connection-pool.html

Just open that html page in your browser and include a prod API key

New page in AI Transport > Token streaming covering
Realtime connections, message ordering guarantees,
transient publishing and channel limits, per-connection
rate limits for both message-per-response and
message-per-token patterns, and a connection pool
example for handling multiple concurrent streams.
@zknill zknill requested a review from mschristensen February 25, 2026 17:34
@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c6c051cf-a06f-443a-b60d-fc7760a75883

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zak/ait-443/server-side-token-streaming-publish

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mschristensen mschristensen added the review-app Create a Heroku review app label Feb 27, 2026
@ably-ci ably-ci temporarily deployed to ably-docs-zak-ait-443-s-whtjs0 February 27, 2026 17:58 Inactive
Copy link
Contributor

@mschristensen mschristensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nice info in here.

I'm not sure about the connection pool implementation specifics here, but I think the abstraction could be useful, I wonder if it's worth just implementing in e.g. ably-js and reviewing the implementation through PR there?


## Transient publishing and channel limits <a id="transient"/>

In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server only publishes to a channel without subscribing, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on [number of channels per connection](/docs/platform/pricing/limits#connection).
In a typical AI application, your server publishes responses to many distinct channels, often one per user session. When your server publishes to a channel without attaching first, the SDK uses a [transient publish](/docs/pub-sub/advanced#transient-publish). Transient publishes do not count toward the limit on the [number of channels per connection](/docs/platform/pricing/limits#connection).
<Aside data-type="note">
The server must attach to the channel in order to subscribe to it. In this case, the SDK client instance will not use transient publishing.
</Aside>


All message actions use the same transient publish path, including `publish()` and `appendMessage()`. This means a single connection can publish to thousands of distinct channels without hitting the channel limit. No additional configuration is required. When you call `publish()` or `appendMessage()` on a channel that the client has not explicitly attached to, the SDK handles the transient attachment automatically.

The constraint to be aware of is the [per-connection inbound message rate](/docs/platform/pricing/limits#connection), not the number of channels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded this section to explain

If you also need to subscribe to channels on the same connection, those subscriptions require explicit attachment and will count toward the channel limit.
</Aside>

## Per-connection rate limits <a id="rate-limits"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the content in this section belongs in the existing /docs/ai-transport/token-streaming/token-rate-limits, were you aware of that page?

Copy link
Contributor Author

@zknill zknill Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were you aware of that page?

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-reviewed this page, and the token rate limits one and I htink :

  1. this page does a better job of actually explaining the concept
  2. the token rate limits stuff here motivates the next section (client/connection pooling)

I'm not sure what our position is on overlapping content, but I think it's more natural to be able to consume all the things you care about here in one doc. But I appreciate it's not great having the same content in two places in slightly different explanations.


<Code>
```javascript
class AblyConnectionPool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wonderin g whether we should call this AblyClientPool. I know there is one connection per client, but there is the concept of a connection inside the client (e.g. the connection state listener client.connection.on etc) so it feels a bit weird to call it the same thing a layer above the client

(We also call it getClient below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think an abstraction like this is useful, I wonder if it's worth adding to the SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind where we add it. We are not currently planning to ship something like this in [any|all] of the SDKs, so the only place I've got to put it is here. I agree though, that it's sort-of too-big to go in docs, and too-small to feel like it justifies adding to ably-js (which has a heavy client-side lean, given most server side publishing is currently rest)

newClient.connection.on((stateChange) => {
console.warn(`[Pool conn ${index}] ${stateChange.previous} → ${stateChange.current}`);
if (stateChange.current === 'failed') {
this._replaceConnection(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a connection fails, there is probably a network issue, and creating a new instance seems unlikely to recover the situation. (Also, in this case, if the new client's connection enter the failed state as a result, this might overflow the call stack?)


When your server handles more concurrent AI response streams than a single connection supports, create additional Realtime clients. Each client uses its own connection with its own message rate budget, so throughput scales linearly with the number of connections.

Route channels to connections using consistent hashing so that all operations for a given channel always go through the same connection. This preserves [message ordering](#ordering) for each response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks like standard modulo hashing, not consistent hashing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, nice! fixed

Simplify the transient publishing section to clarify when
transient publish is used and why the per-connection inbound
message rate is the binding constraint. Rename AblyConnectionPool
to AblyClientPool for accuracy since the pool manages client
instances. Fix "consistent hashing" terminology to "hash function"
since the implementation uses modulo hashing. Remove the
_replaceConnection method which could cause stack overflow and
is unlikely to recover from network failures.
@ably-ci ably-ci temporarily deployed to ably-docs-zak-ait-443-s-whtjs0 March 2, 2026 16:49 Inactive
@zknill
Copy link
Contributor Author

zknill commented Mar 2, 2026

thanks for review @mschristensen , two things left to decide:

  1. What to do about the overlapping content around appendRollupWindow and rate limits?
  2. What to do about the suggestion / though of moving the AblyClientPool into the ably-js sdk?

I don't have strong views on these, I could make arguments either way (but I've pushed as-is because I think the current version of leaving both here is my favourite).

@zknill zknill requested a review from mschristensen March 2, 2026 16:50
@zknill
Copy link
Contributor Author

zknill commented Mar 3, 2026

Decided:

  1. the token rate limits thing should go on the token rate limit page
  2. move the connection/client pooling thing into ably-js

Move detailed rate limit content (rollup tables, code examples,
concurrent streams data) from server-publishing.mdx to
token-rate-limits.mdx where it belongs. Replace the full
AblyClientPool implementation with concise conceptual guidance,
as the pool abstraction will be implemented in ably-js instead.

server-publishing.mdx now links to token-rate-limits.mdx for
rollup configuration details and retains a brief overview section
for each token streaming pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-app Create a Heroku review app

Development

Successfully merging this pull request may close these issues.

3 participants