Skip to content

feat: add channel-level echo suppression spec points#433

Open
mattheworiordan wants to merge 3 commits intomainfrom
feature/channel-echo-param
Open

feat: add channel-level echo suppression spec points#433
mattheworiordan wants to merge 3 commits intomainfrom
feature/channel-echo-param

Conversation

@mattheworiordan
Copy link
Member

@mattheworiordan mattheworiordan commented Mar 5, 2026

Summary

Adds specification points for per-channel echo suppression via the echo channel param:

  • RTL4n: Core behavior — echo=false suppresses own messages on a channel regardless of connection-level echoMessages
  • RTL4n1: Presence unaffected — echo only applies to messages, consistent with RTN2b
  • RTL4n2: Continuing property — changing echo via setOptions triggers re-ATTACH
  • RTL7i/RTL7j: Test requirements for channel-level and connection-level interaction
  • TO3h/RTN2b: Cross-references to the new channel param

Why

Echo control has been connection-level only (echoMessages: false) since the beginning. This forces all channels on a connection to share the same echo behavior, which is inadequate for:

  • AI Transport — agents publish tokens at high frequency and subscribe for control messages. Every AI Transport example sets echoMessages: false at the connection level
  • RPC over channels — callers publish requests and subscribe for responses on the same channel, requiring echoMessages: false at the connection level
  • Mixed workloads — some channels need echo (collaborative editing) while others don't (notifications, telemetry)

This has been a long-standing discussion. Channel-level control is simple, fully backwards compatible, and removes a common friction point for customers.

Related PRs

Test plan

  • Spec point numbering verified (no duplicates, follows existing patterns)
  • Cross-references validated
  • Review by spec maintainers

New spec points RTL4n, RTL4n1, RTL4n2 define echo=false as a channel
param for per-channel echo suppression. RTL7i and RTL7j add test
requirements. Updates TO3h and RTN2b with cross-references. Updates
params api-docstring to document the echo channel param.
** @(RTL4m)@ On receipt of an @ATTACHED@, the client library should decode the @flags@ into an array of @ChannelMode@ s (that is, the same format as @ChannelOptions.modes@) and expose it as a read-only @modes@ field of the @RealtimeChannel@ (or a @getModes()@ method where that is more idiomatic). This should only contain @ChannelMode@ s: it should not contain flags which are not modes (see "@TB2d@":#TB2d)
** @(RTL4n)@ If the @params@ object includes an @echo@ key with value @"false"@, messages originating from this connection must not be echoed back on this channel, regardless of the connection-level @echoMessages@ setting ("@TO3h@":#TO3h). The only valid value for the @echo@ channel param is @"false"@. If the @echo@ param is absent or empty, the connection-level @echoMessages@ setting applies.
*** @(RTL4n1)@ The @echo@ channel param only affects messages (action @MESSAGE@). Presence messages must always be delivered regardless of the @echo@ setting, consistent with connection-level @echoMessages@ behavior ("@RTN2b@":#RTN2b).
*** @(RTL4n2)@ The @echo@ channel param is a continuing property: changing it via @setOptions@ ("@RTL16a@":#RTL16a) on an already-attached channel must trigger a new @ATTACH@ to update the server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these spec items are needed. This is the SDK spec, but these are all instructing the server how to behave ("messages originating from this connection must not be echoed back on this channel" etc). It isn't the SDK's responsibility to interpret, validate, or act based on the params (indeed that's the whole point of params), its responsibility is just to encode them correctly and send them to the server, which behaviour is specified by RTL4k.

Copy link
Member Author

@mattheworiordan mattheworiordan Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the cross-references from RTN2b, TO3h, and the docstring change - those were clearly wrong (leaking channel concerns into connection/docstring domains, thanks for spotting 🙏 )

However, I'd like to keep RTL4n and its sub-points. A few reasons:

  1. RTL4j2 sets precedent. It describes the expected behavior of the rewind channel param in a test scenario, including what the client should receive when rewind=1 is set. If params were truly opaque to the spec, RTL4j2
    wouldn't exist either.
  2. This is end-to-end expected behavior, not server internals. The echo param changes what the end user expects from the library. An SDK setting echo=false as a channel param is making a promise to the developer that their messages won't be echoed. If a future regression corrupts or voids that param, without these spec points and tests, no SDK would detect the breakage.
  3. The spec is the single source of truth for AI agents building SDKs. We've seen this directly with @paddybyers' recent work using the spec to drive SDK implementations. These assertions describe expected behavior that agents (and humans) need to understand. The slight verbosity is a feature, not a bug.

Happy to discuss trimming the wording, but I don't think removing these entirely is right. RTL4n2 I could see dropping since it's just restating RTL16a behavior for params changes.

Copy link
Collaborator

@lawrence-forooghian lawrence-forooghian Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that spec points like these new RTL4n ones are not just unnecessary but actively confusing, because as written it's very easy to come away with the impression that the SDK is meant to interpret the channel echo param.

I've seen agents fall into similar traps already when implementing presence — e.g. RTP9b for presence.update() says "If the client was not already entered, it enters this client into this channel". This spec point is describing a behaviour of Realtime — all that the client needs to do is send the UPDATE — but the agent wrote an implementation that looked in the local presence set and, if not there, sent an ENTER before proceeding to the UPDATE.

I agree that it's good to test end-to-end behaviours, and if we don't have a better way of enforcing that in a cross-SDK fashion at the moment than having a "there must be a test that x" spec point (i.e. your new RTL7 subpoints), then let's do that. But I don't think we should be specifying end-to-end behaviours other than to describe what the client needs to do to enable them.

Copy link
Member

@SimonWoolf SimonWoolf Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTL4j2 sets precedent. It describes the expected behavior of the rewind channel param in a test scenario, including what the client should receive when rewind=1 is set. If params were truly opaque to the spec, RTL4j2
wouldn't exist either.

That doesn't follow. At least one channelparams test is necessary to test that the SDK is correctly actually including and correctly encoding channelparams. But since it's opaque, you shouldn't need to test them all.

without these spec points and tests, no SDK would detect the breakage.

None of our SDK test suites come remotely close to being a comprehensive set of tests for all our serverside behaviour. And that's ok, because that's not their job. Such tests should be in the realtime repo, which they are since you added them in http://github.com/ably/realtime/pull/8179/changes#diff-0d808dcb3738cdcf18633f4ba4e6620a800ed8381cba2b2a90c32dc86bf9f512R1566 .

These assertions describe expected behavior that agents (and humans) need to understand. The slight verbosity is a feature, not a bug.

Sure. You could imagine we could have a spec which does that job, essentially a combination of the sdk spec with all our documentation, and it would be quite useful for those reasons.

But that's just not what the spec is right now, not even close. Unless you're proposing that these items (plus a handful of others, mostly from you) are the first in a series of PRs that will increase the size of the spec by a factor of 5 by moving to this approach more generally, the only thing this does it make the spec a bit more incoherent. As Lawrence says, a handful of items speccing out serverside behaviour in a spec that mostly specs out clientside behaviour just increases confusion, both for humans and agents.

And I actually don't think we should move in that direction, for DRY reasons. I think we should keep the spec describing in specific detail what SDKs should do, and encourage agents (and humans) that need to understand how the system works as a whole to read it in conjunction with our docs, not try and make this spec do both.

** @(RTL7e)@ This clause has been deleted (redundant to "RSL6b":#RSL6b).
** @(RTL7f)@ A test should exist ensuring published messages are not echoed back to the subscriber when @echoMessages@ is set to false in the @RealtimeClient@ library constructor
** @(RTL7i)@ A test should exist ensuring published messages are not echoed back to the subscriber when @echo@ is set to @"false"@ in the channel @params@ ("@RTL4n@":#RTL4n), even when @echoMessages@ is @true@ (default) at the connection level.
** @(RTL7j)@ A test should exist ensuring that on a connection with @echoMessages: false@, a channel without the @echo@ param still suppresses echo (inherits connection-level behavior).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we should be mandating all SDKs ave a test for a serverside behaviour

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTL7f already mandates exactly this pattern: "A test should exist ensuring published messages are not echoed back to the subscriber when echoMessages is set to false." That's testing server-side echo suppression, the SDK just sets a query param, the server does the work.

RTL7i/RTL7j follow that exact pattern for the channel-level equivalent. If the principle is that we shouldn't mandate tests for server behaviour, then RTL7f should also be removed (see comments below on this).

These tests exist to catch regressions. If a server update changes echo param handling, or if an SDK regression corrupts the param encoding, these tests are the safety net. That's exactly why we have end-to-end test requirements in the spec. If we want to change it, then let's change it, but not just for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is that all statements about what tests should exist will move to the UTS. They shouldn't be in this spec.

* @(RTN2)@ The default host used for realtime "websocket":https://ably.com/topic/websockets connections is the "@REC1@":#REC1 primary domain and the following query string params should be used when opening a new connection:
** @(RTN2a)@ @format@ should be @msgpack@ (default) or @json@
** @(RTN2b)@ @echo@ should be @true@ if the @echoMessages@ client option is true; else it should be @false@, which will prevent messages published by the client being echoed back
** @(RTN2b)@ @echo@ should be @true@ if the @echoMessages@ client option is true; else it should be @false@, which will prevent messages published by the client being echoed back. The connection-level @echo@ setting may be overridden on a per-channel basis using the @echo@ channel param; see "@RTL4n@":#RTL4n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar. First sentence is an instruction for how the sdk should behave (turn echoMessages into an echo qs param), plus a clause explaining what it will do (so the sdk can test it). The new second sentence is essentially user-facing documentation, it's not actionable for an SDK dev.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, removed in d6228a1. Thanks

| push: PushChannel |||| A [`PushChannel`]{@link PushChannel} object. |
| modes: readonly [ChannelMode] ||| RTL4m | An array of [`ChannelMode`]{@link ChannelMode} objects. |
| params: readonly `Dict<String, String>` ||| RTL4k1 | Optional [channel parameters](https://ably.com/docs/realtime/channels/channel-parameters/overview) that configure the behavior of the channel. |
| params: readonly `Dict<String, String>` ||| RTL4k1 | Optional [channel parameters](https://ably.com/docs/realtime/channels/channel-parameters/overview) that configure the behavior of the channel. The `echo` channel param can be set to `"false"` to suppress messages originating from this connection being echoed back on this channel, regardless of the connection-level `echoMessages` setting. See RTL4n. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user-visible docstring, it links to the user docs which have a list of accepted params, it shouldn't be too wordy or try to list or explain all (or any) of them. It also shouldn't reference a spec item, which is meaningless to a user trying to integrate with the library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, reverted in c4da037, I should have spotted that one 😢

The params docstring is user-facing and should remain generic,
linking to the docs for details. It shouldn't call out specific
params or reference spec items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot temporarily deployed to staging/pull/433 March 5, 2026 22:19 Inactive
These cross-references leak channel-level concerns into the
connection domain. The echo channel param is documented in
RTL4n where it belongs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattheworiordan
Copy link
Member Author

Thanks for the review @SimonWoolf, I've addressed points above.

More broadly though, this review highlights something I think we need to address. With AI agents increasingly contributing to the spec (and using it as source of truth for SDK implementations, as we've seen with Paddy's
work), we need clear, codified conventions so contributors know what's expected.

Right now, conventions are implicit and sometimes inconsistently applied. For example, RTL7f mandates a test for server-side echo suppression (which is the exact pattern RTL7i/RTL7j follow), and RTL4j2 describes expected
behavior of the rewind channel param in a test scenario. An AI agent following existing precedent will naturally produce similar patterns, and that's reasonable. If we want agents (and humans) to deviate from what exists today, we need to write that down.

Simon, you're the second-highest contributor to the spec by lines added (~1,766 additions in the last year), and one of the most technically rigorous reviewers. I'd like to see you drive an AGENTS.md (or CLAUDE.md) file for this repo that codifies the conventions you and other reviewers enforce. This would save significant review cycles, reduce friction for contributors, and give AI agents the guardrails they need to produce spec-quality output on the first pass.

To show what I mean, here's a rough draft based on patterns I've extracted from the last ~12 months of PR review feedback. This is just a starting point / example to be thrown away. Could you own and refine it based on your deeper
understanding of the spec's conventions perhaps with the other core contributor @lawrence-forooghian?

Here's an example of what Claude extracted for an AGENTS.md file:


Ably Specification: AI Agent & Contributor Guidelines

This file provides conventions and guidance for contributing to the Ably features
specification — whether you're a human, an AI coding agent, or using AI assistance.

These conventions are derived from patterns observed across PR reviews by core
maintainers. See also CONTRIBUTING.md for structural and
formatting rules.

Scope: SDK Spec vs Server Behavior

This is the SDK specification — it describes what client libraries must do.

  • DO specify how the SDK should encode, send, and expose data
  • DO describe expected end-to-end behavior that SDKs should test (e.g. "a test
    should exist ensuring X happens when Y is set")
  • DO describe server responses the SDK must handle (ATTACHED params, ERROR
    messages, state transitions)
  • AVOID describing server-internal implementation details that the SDK cannot
    observe or act on
  • AVOID documenting why the server behaves a certain way — just describe the
    observable behavior the SDK needs to account for

Example — good: "If echoMessages is false, published messages are not echoed
back to the subscriber" (observable, testable)

Example — avoid: "The server maintains an internal echo flag per connection
which is checked during message fanout" (server implementation detail)

Channel Params

Channel params are passed to the server via RTL4k. The SDK's role is to encode
and send them, then expose validated params from the ATTACHED response (RTL4k1).

When a channel param has user-observable behavioral implications that SDKs
should test (like echo or rewind), it is appropriate to add spec points
describing the expected behavior and test requirements. This follows the pattern
of RTL4j2 (rewind) and RTL7f (echoMessages).

Language and Precision

  • Use RFC 2119 language for requirements: MUST, SHALL, SHOULD, MAY. Avoid
    using "will" for requirements — it has no RFC 2119 definition
  • Use precise types: Dict<String, String> not object; JSON-encodable object or array not JSON
  • Be explicit about nullability: if a field can be null, say so. Don't use
    empty string as a default for nullable fields unless there's a specific reason

Cross-References

  • Link text must be the spec point ID: write "RTL4n":#RTL4n, not
    free-text descriptions
  • Property references include the spec point ID: write
    the "CD2j":#CD2j @ConnectionDetails.siteCode@
  • Don't reference non-existent spec points
  • Keep references within their domain: don't add channel-level
    cross-references to connection-level spec points or vice versa, unless
    the SDK must act on the relationship

Redundancy

  • Remove redundancy rather than annotating it with comments
  • If a new spec point makes an older one redundant, follow the Replacement
    guidance in CONTRIBUTING.md
  • Don't repeat the same behavioral description in multiple spec points

Test Requirements

  • Test requirements ("A test should exist...") are appropriate when:
    • The SDK has logic to test (encoding, state management, callbacks)
    • End-to-end behavior should be verified to catch regressions
    • There is interaction between SDK options (e.g. connection-level vs
      channel-level settings)
  • Follow the existing pattern: describe what the test should verify, not
    how to implement it

API Docstrings (api-docstrings.md)

  • Docstrings are user-facing — they appear in SDK documentation
  • Keep descriptions generic where the field links to external docs
  • Never reference spec point IDs in docstrings (meaningless to users)
  • Don't enumerate specific values for open-ended fields like params
    link to the docs instead
  • See CONTRIBUTING.md for detailed formatting conventions

Structural Rules

  • Parent clauses are headers, not behavior — put implementation details
    in subclauses
  • One behavior per subclause — avoid conditional statements and multiple
    behaviors in a single spec item
  • Order for coherence, not strictly by ID — readability takes priority
    over numerical ordering
  • Don't over-specify internal state — if a simpler description of
    observable behavior suffices, prefer that over prescribing internal
    buffering or state management

Spec Point Lifecycle

  • Never reuse deleted IDs — if XXX1b was deleted, add XXX1d not a
    new XXX1b
  • Replacement > Deletion: when a new spec point supersedes an old one,
    use "replaced by" not "deleted"
  • Version markers are for behavioral changes only — don't add "as of
    version X.Y" for clarifications or wording improvements

Error Codes

  • Use distinct error codes for distinct failure modes (aids observability)
  • Server errors use the 500 namespace
  • User/client errors use the 4xx namespace
  • Don't duplicate error codes from the core SDK in product-specific sections

Composability

  • Consider that SDKs may be used in composable/modular architectures
  • Avoid methods like offAll or unsubscribeAll that break composability

@lawrence-forooghian
Copy link
Collaborator

Totally agree re having a good CLAUDE.md for the spec. I've raised https://ably.atlassian.net/browse/AIT-530 — will bring it up in AIT planning tomorrow

@SimonWoolf
Copy link
Member

@mattheworiordan fully agree with this 👍, I'm happy to take on, but also very happy for @lawrence-forooghian to drive if he'd like to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants