Re: [Last-Call] [AVTCORE] Opsdir last call review of draft-ietf-avtcore-multi-party-rtt-mix-14

Gunnar Hellström <gunnar.hellstrom@xxxxxxxxxxx> · Wed, 28 Apr 2021 13:44:20 +0200

Jürgen,

I have submitted a new version, acting on your comments.

The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-avtcore-multi-party-rtt-mix/

There is also an HTML version available at:
https://www.ietf.org/archive/id/draft-ietf-avtcore-multi-party-rtt-mix-15.html

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-avtcore-multi-party-rtt-mix-15

Here is a summary of what I did:

Actions on review comments from Jurgen Schonwalder:

   A bit more about congestion situations and that they are expected to
   be very rare.

   Explanation of differences in security between the conference-aware
   and the conference-unaware case added in security section.

   Presentation examples with source labels made less confusing, and
   explained.

   Reference to T.140 inserted at first mentioning of T.140.

   Reference to RFC 8825 inserted to explain WebRTC

   Nit in wording in terminology section adjusted.

I hope this satisfies your comments.

Thanks,
Gunnar

Den 2021-04-27 kl. 23:58, skrev Gunnar Hellström:
Thank you for your review.

I will provide initial reactions here, and follow up with more exact 
proposals for changes.

Den 2021-04-27 kl. 12:25, skrev Jürgen Schönwälder via Datatracker:
Reviewer: Jürgen Schönwälder
Review result: Has Nits

I am by no means an expert in this area so please take this into
account while reading my comments...

Content comments:

* The document assumes "human" communication, i.e., where text
   originates at a speed of a human and politeness is used to resolve
   concurrency conflicts. This seems to be a fair assumption for the
   considered use cases but what happens if this assumption is not met?
   Can systems or RTP mixers detect and handle such situations
   gracefully or is the idea that any resulting "jerkiness" must be
   accepted if senders misbehave?

[GH] The receivers are expected to declare their reception speed 
capability in characters per second (over a 10 second period). A high 
capability will allow smooth flow of text from many participants 
simultaneously. The congestion section 9 tells what can be done if the 
load gets too high. The mixer can then discard text, and if it has any 
means to detect who is the main contributer, it can avid to discard 
text from that participant. (e.g. by the sdp "content" attribute.

This is of course not nice. But it is equally not nice to try to hear 
any specific voice in a mixed audio channel with many participants. 
And same with video in cases when the video has real information.

So we are in good company with the other media in the problem that 
there are no really good solutions to an information overload situation.

In most cases the typing participants will send a sentence or two and 
then stop and read or listen to the others. In such situations the 
overload will be sorted out after a short while even without the 
participants being very polite.

Do you want me to elaborate more about this in the congestion chapter 9?

* The solution does not provide end-to-end security since the mixer
   must be trusted to have access to the texts in order do the mixing.
   This is mentioned in the security considerations and in section 2
   where alternatives are considered. The reason to not select a
   solution providing end-to-end security is give in section 1.2. Is
   there work planned to address this issue, i.e., to complement this
   solution with a solution providing end-to-end security?

[GH] There is another individual draft 
"draft-hellstrom-avtcore-multi-party-rtt-solutions", intended to 
document design choices behind the reviewed draft. It discusses the 
end-to-end security topic among other things but does not solve it. If 
there are requests for it, that work could be continued to provide 
specific solutions. But I would like to hear the request first.

Maybe it is more urgent to specify how RFC 8865 "real-time text in 
WebRTC" can be used in a multi-party setting with end-to-end security.

I would prefer to let the discussion of the topic in the reviewed 
draft be sufficient fo now.

* Perhaps the recommendation in section 4.2.6 that the mixing method
   for multi-party unaware endpoints is not RECOMMENDED to be used
   should be repeated in the security considerations? It seems there
   are serious limitations, in particular also related to the creation
   of a presentation that can make it impossible to detect masquerade
   attacks. Yes, masquerading is mentioned but from an outside security
   point of view it feels like there was a strong security solution
   that was discarded due to lack of implementation support, there is a
   somewhat OK solution (but not able to provide end-to-end security),
   and there is a pretty ugly solution to accommodate endpoints with no
   support for the other solution. If this is a fair summary, perhaps
   explaining this clearly in the security considerations would be a
   good thing.
[GH] Yes, good point. I will compose a proposal.

* I am confused about Figures 5 and 6 since the mixed identities of
   the sources are once shown in square brackets and once in
   parenthesis. Are labels like [Alice] or [Bob] not inserted by the
   mixer? If so, why would the format on the endpoint be different? Is
   the idea that endpoints try to parse the mixed text in order to
   render it differently? Or was the idea to show that different mixers
   can use different styles to generate labels, i.e., I should not
   really compare Figure 5 and 6?

[GH] The figures should be possible to compare. And, yes, I have 
caused confusion by letting the mixer create labels with brackets in 
figure 5 but with parentheses in figure 6. In figure 6 the brackets 
are inserted by the receiving terminal in a way that has become quite 
common in RTT implementations, but the parentheses come from the 
mixer. Alice is the local user. Her text is merged locally and 
therefore get the label assigned locally.

I will change so that the type of label framing is consistent and 
insert some words about the labels and their framing.

Editorial comments:

* I suggest to cite [T140] when you first refer to it in the
   Introduction:

   OLD

    A requirement related to multi-party sessions from the presentation
    level standard T.140 for real-time text is: "The display of text 
from

   NEW

    A requirement related to multi-party sessions from the presentation
    level standard T.140 [T140] for real-time text is: "The display 
of text from
[GH] In the previous review, I got a recommendation to delete many 
such standard names before the reference, but you are right that it 
would probably be good with using it once.
* as defined -> are defined and missing full stop

   OLD

    The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
    mixer, RTP-translator as defined in [RFC3550]

   NEW

    The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
    mixer, RTP-translator are defined in [RFC3550].
[GH] Yes, will do.

* Add reference(s) to WebRTC in the terminology section?
[GH] Yes, will do.

Thanks,

Gunnar

_______________________________________________
Audio/Video Transport Core Maintenance
avt@xxxxxxxx
https://www.ietf.org/mailman/listinfo/avt

--
Gunnar Hellström
GHAccess
gunnar.hellstrom@xxxxxxxxxxx

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call