Thank you for your review.
I will provide initial reactions here, and follow up with more exact
proposals for changes.
Den 2021-04-27 kl. 12:25, skrev Jürgen Schönwälder via Datatracker:
Reviewer: Jürgen Schönwälder
Review result: Has Nits
I am by no means an expert in this area so please take this into
account while reading my comments...
Content comments:
* The document assumes "human" communication, i.e., where text
originates at a speed of a human and politeness is used to resolve
concurrency conflicts. This seems to be a fair assumption for the
considered use cases but what happens if this assumption is not met?
Can systems or RTP mixers detect and handle such situations
gracefully or is the idea that any resulting "jerkiness" must be
accepted if senders misbehave?
[GH] The receivers are expected to declare their reception speed
capability in characters per second (over a 10 second period). A high
capability will allow smooth flow of text from many participants
simultaneously. The congestion section 9 tells what can be done if the
load gets too high. The mixer can then discard text, and if it has any
means to detect who is the main contributer, it can avid to discard text
from that participant. (e.g. by the sdp "content" attribute.
This is of course not nice. But it is equally not nice to try to hear
any specific voice in a mixed audio channel with many participants. And
same with video in cases when the video has real information.
So we are in good company with the other media in the problem that there
are no really good solutions to an information overload situation.
In most cases the typing participants will send a sentence or two and
then stop and read or listen to the others. In such situations the
overload will be sorted out after a short while even without the
participants being very polite.
Do you want me to elaborate more about this in the congestion chapter 9?
* The solution does not provide end-to-end security since the mixer
must be trusted to have access to the texts in order do the mixing.
This is mentioned in the security considerations and in section 2
where alternatives are considered. The reason to not select a
solution providing end-to-end security is give in section 1.2. Is
there work planned to address this issue, i.e., to complement this
solution with a solution providing end-to-end security?
[GH] There is another individual draft
"draft-hellstrom-avtcore-multi-party-rtt-solutions", intended to
document design choices behind the reviewed draft. It discusses the
end-to-end security topic among other things but does not solve it. If
there are requests for it, that work could be continued to provide
specific solutions. But I would like to hear the request first.
Maybe it is more urgent to specify how RFC 8865 "real-time text in
WebRTC" can be used in a multi-party setting with end-to-end security.
I would prefer to let the discussion of the topic in the reviewed draft
be sufficient fo now.
* Perhaps the recommendation in section 4.2.6 that the mixing method
for multi-party unaware endpoints is not RECOMMENDED to be used
should be repeated in the security considerations? It seems there
are serious limitations, in particular also related to the creation
of a presentation that can make it impossible to detect masquerade
attacks. Yes, masquerading is mentioned but from an outside security
point of view it feels like there was a strong security solution
that was discarded due to lack of implementation support, there is a
somewhat OK solution (but not able to provide end-to-end security),
and there is a pretty ugly solution to accommodate endpoints with no
support for the other solution. If this is a fair summary, perhaps
explaining this clearly in the security considerations would be a
good thing.
[GH] Yes, good point. I will compose a proposal.
* I am confused about Figures 5 and 6 since the mixed identities of
the sources are once shown in square brackets and once in
parenthesis. Are labels like [Alice] or [Bob] not inserted by the
mixer? If so, why would the format on the endpoint be different? Is
the idea that endpoints try to parse the mixed text in order to
render it differently? Or was the idea to show that different mixers
can use different styles to generate labels, i.e., I should not
really compare Figure 5 and 6?
[GH] The figures should be possible to compare. And, yes, I have caused
confusion by letting the mixer create labels with brackets in figure 5
but with parentheses in figure 6. In figure 6 the brackets are inserted
by the receiving terminal in a way that has become quite common in RTT
implementations, but the parentheses come from the mixer. Alice is the
local user. Her text is merged locally and therefore get the label
assigned locally.
I will change so that the type of label framing is consistent and insert
some words about the labels and their framing.
Editorial comments:
* I suggest to cite [T140] when you first refer to it in the
Introduction:
OLD
A requirement related to multi-party sessions from the presentation
level standard T.140 for real-time text is: "The display of text from
NEW
A requirement related to multi-party sessions from the presentation
level standard T.140 [T140] for real-time text is: "The display of text from
[GH] In the previous review, I got a recommendation to delete many such
standard names before the reference, but you are right that it would
probably be good with using it once.
* as defined -> are defined and missing full stop
OLD
The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
mixer, RTP-translator as defined in [RFC3550]
NEW
The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
mixer, RTP-translator are defined in [RFC3550].
[GH] Yes, will do.
* Add reference(s) to WebRTC in the terminology section?
[GH] Yes, will do.
Thanks,
Gunnar
_______________________________________________
Audio/Video Transport Core Maintenance
avt@xxxxxxxx
https://www.ietf.org/mailman/listinfo/avt
--
Gunnar Hellström
GHAccess
gunnar.hellstrom@xxxxxxxxxxx
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call