Re: [Last-Call] [AVTCORE] Opsdir last call review of draft-ietf-avtcore-multi-party-rtt-mix-14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jürgen,

I have submitted a new version, acting on your comments.

The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-avtcore-multi-party-rtt-mix/

There is also an HTML version available at:
https://www.ietf.org/archive/id/draft-ietf-avtcore-multi-party-rtt-mix-15.html

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-avtcore-multi-party-rtt-mix-15


Here is a summary of what I did:

Actions on review comments from Jurgen Schonwalder:

   A bit more about congestion situations and that they are expected to
   be very rare.

   Explanation of differences in security between the conference-aware
   and the conference-unaware case added in security section.

   Presentation examples with source labels made less confusing, and
   explained.

   Reference to T.140 inserted at first mentioning of T.140.

   Reference to RFC 8825 inserted to explain WebRTC

   Nit in wording in terminology section adjusted.

I hope this satisfies your comments.

Thanks,
Gunnar


Den 2021-04-27 kl. 23:58, skrev Gunnar Hellström:
Thank you for your review.

I will provide initial reactions here, and follow up with more exact proposals for changes.

Den 2021-04-27 kl. 12:25, skrev Jürgen Schönwälder via Datatracker:
Reviewer: Jürgen Schönwälder
Review result: Has Nits

I am by no means an expert in this area so please take this into
account while reading my comments...

Content comments:

* The document assumes "human" communication, i.e., where text
   originates at a speed of a human and politeness is used to resolve
   concurrency conflicts. This seems to be a fair assumption for the
   considered use cases but what happens if this assumption is not met?
   Can systems or RTP mixers detect and handle such situations
   gracefully or is the idea that any resulting "jerkiness" must be
   accepted if senders misbehave?

[GH] The receivers are expected to declare their reception speed capability in characters per second (over a 10 second period). A high capability will allow smooth flow of text from many participants simultaneously. The congestion section 9 tells what can be done if the load gets too high. The mixer can then discard text, and if it has any means to detect who is the main contributer, it can avid to discard text from that participant. (e.g. by the sdp "content" attribute.

This is of course not nice. But it is equally not nice to try to hear any specific voice in a mixed audio channel with many participants. And same with video in cases when the video has real information.

So we are in good company with the other media in the problem that there are no really good solutions to an information overload situation.

In most cases the typing participants will send a sentence or two and then stop and read or listen to the others. In such situations the overload will be sorted out after a short while even without the participants being very polite.

Do you want me to elaborate more about this in the congestion chapter 9?


* The solution does not provide end-to-end security since the mixer
   must be trusted to have access to the texts in order do the mixing.
   This is mentioned in the security considerations and in section 2
   where alternatives are considered. The reason to not select a
   solution providing end-to-end security is give in section 1.2. Is
   there work planned to address this issue, i.e., to complement this
   solution with a solution providing end-to-end security?

[GH] There is another individual draft "draft-hellstrom-avtcore-multi-party-rtt-solutions", intended to document design choices behind the reviewed draft. It discusses the end-to-end security topic among other things but does not solve it. If there are requests for it, that work could be continued to provide specific solutions. But I would like to hear the request first.

Maybe it is more urgent to specify how RFC 8865 "real-time text in WebRTC" can be used in a multi-party setting with end-to-end security.

I would prefer to let the discussion of the topic in the reviewed draft be sufficient fo now.


* Perhaps the recommendation in section 4.2.6 that the mixing method
   for multi-party unaware endpoints is not RECOMMENDED to be used
   should be repeated in the security considerations? It seems there
   are serious limitations, in particular also related to the creation
   of a presentation that can make it impossible to detect masquerade
   attacks. Yes, masquerading is mentioned but from an outside security
   point of view it feels like there was a strong security solution
   that was discarded due to lack of implementation support, there is a
   somewhat OK solution (but not able to provide end-to-end security),
   and there is a pretty ugly solution to accommodate endpoints with no
   support for the other solution. If this is a fair summary, perhaps
   explaining this clearly in the security considerations would be a
   good thing.
[GH] Yes, good point. I will compose a proposal.

* I am confused about Figures 5 and 6 since the mixed identities of
   the sources are once shown in square brackets and once in
   parenthesis. Are labels like [Alice] or [Bob] not inserted by the
   mixer? If so, why would the format on the endpoint be different? Is
   the idea that endpoints try to parse the mixed text in order to
   render it differently? Or was the idea to show that different mixers
   can use different styles to generate labels, i.e., I should not
   really compare Figure 5 and 6?

[GH] The figures should be possible to compare. And, yes, I have caused confusion by letting the mixer create labels with brackets in figure 5 but with parentheses in figure 6. In figure 6 the brackets are inserted by the receiving terminal in a way that has become quite common in RTT implementations, but the parentheses come from the mixer. Alice is the local user. Her text is merged locally and therefore get the label assigned locally.

I will change so that the type of label framing is consistent and insert some words about the labels and their framing.


Editorial comments:

* I suggest to cite [T140] when you first refer to it in the
   Introduction:

   OLD

    A requirement related to multi-party sessions from the presentation
    level standard T.140 for real-time text is: "The display of text from

   NEW

    A requirement related to multi-party sessions from the presentation
    level standard T.140 [T140] for real-time text is: "The display of text from
[GH] In the previous review, I got a recommendation to delete many such standard names before the reference, but you are right that it would probably be good with using it once.
* as defined -> are defined and missing full stop

   OLD

    The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
    mixer, RTP-translator as defined in [RFC3550]

   NEW

    The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
    mixer, RTP-translator are defined in [RFC3550].
[GH] Yes, will do.

* Add reference(s) to WebRTC in the terminology section?
[GH] Yes, will do.

Thanks,

Gunnar



_______________________________________________
Audio/Video Transport Core Maintenance
avt@xxxxxxxx
https://www.ietf.org/mailman/listinfo/avt

--
Gunnar Hellström
GHAccess
gunnar.hellstrom@xxxxxxxxxxx

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux