30.05.2014 18:43, Tanu Kaskinen wrote: >> 2 - esound sink and source as Alexander suggests(source not complete). >> 3 - RTP over unix domain socket(module-rtp-send not complete as >> Laurentiu Nicola says). >> >> I'm ok with 2 or 3, but I want to make sure it's the best decision >> long term. I think there will be a lot of users using PA this way. > > I don't know the details of any of the three protocols (custom xrdp, > esound or rtp), so I don't have any opinions like "you really should use > X" or "you really shouldn't use Y". OK, here are some bad words about the protocols. The main reason why I am currently against the current custom protocol is: Any custom protocol will likely evolve, and, with the current inability to build out-of-tree modules, it means that future versions of both xrdp and PulseAudio will have to deal somehow with any resulting version mismatch. The current protocol doesn't provide any versioning, though, and that's a problem _if_ the custom protocol (as opposed to a suitable but set-in-stone standard protocol) is accepted as the way forward. The second reason was (see below for factors that amend it): The current custom protocol is essentially a copy of the esound protocol with minor variations. All criticisms that apply to module-esound-sink will also apply to the current module-xrdp-sink. Conversely, if any current criticisms on module-esound-sink actually don't apply in this use case to module-xrdp-sink, then they are irrelevant for module-esound-sink, too. ...which Tanu has worded in a more positive way: > If the esound protocol "deficiencies" (that I'm not familiar with) don't > really matter in case of XRDP, and there's not a lot of mandatory extra > cruft in the protocol that isn't necessary with XRDP, then reusing the > esound protocol sounds like a good idea. Note that I don't propose to implement the whole esound protocol - just enough to interoperate with PulseAudio and maybe the most common clients. The claimed deficiencies of the esound sink are high latency and even worse latency estimation, i.e. a/v sync issues. However, there is something strange (possible bug, found by code inspection, I have not tested anything yet) in module-esound-sink.c. It creates a socket, relies on the socket buffer becoming full for throttling the sink rendering process, but never sets the SO_SNDBUF option, either directly or through the helper from pulsecore/socket-util.c. And the default is more than 256 KB! So no wonder that the socket accumulates a lot of sound data (and thus latency) before throttling. As for the bad latency estimation, I think this applies only to networked connections. Indeed, the esound protocol has a request for querying the server-internal latency, and PulseAudio issues it. The total latency consists of the amount of the sound data buffered in the esound server, the network, and locally in the client. The only unknown here is the network: the server-internal latency can be queried, and the amount of locally-buffered data is known via SIOCOUTQ. But for local connections, the amount of data buffered by the network is zero, so this criticism also seems unfounded in the XRDP case. Now let's compare the protocols. As Tanu has already mentioned, there is an important difference between the custom protocol and the esound protocol. Namely, the clock source. module-esound-sink uses the remote clock source: it writes to the socket as quickly as possible until its buffer fills up, and unblocks when esound (or xrdp) reads some data out. module-xrdp-sink uses the local clock to move samples to the socket (sleep, write, sleep, write, and so on), and assumes that xrdp will read the samples out quickly enough so that the writes never block. I do not know what provides this guarantee. For it to be true, there should be "something" somewhere that measures the rate at which the sound samples are arriving, and compensates for the clock drift between the local system and the remote sound card. I.e. let's suppose that the remote system thinks that the fragments being sent out are 29.99 ms apart, and not 30 ms as the local system thinks. The difference will accumulate, and, unless some samples are dropped or the stream is resampled by a factor of 30/29.99, there will be something like a blocked socket or overfilled buffer. The same "need to have an adaptive resampler" problem apples to RTP or to any other protocol that relies on the local clock. If the wanted semantics is "remote soundcard clock is the master clock", then the esound protocol will be suitable. If "local clock is the master clock" is actually wanted, then any of the three protocols would somehow work (and with esound protocol, the local clock would be inside xrdp server then). Now let's turn to protocol elements. The custom protocol has an explicit opcode for pausing the stream. This was one of the reasons that lead to its creation. I don't know yet whether PulseAudio would suspend the esound-protocol stream, but if necessary, this could be added. The possible implementation alternatives are to either disconnect until it has something else to play (which PulseAudio certainly does not do), or to simply stop the data flow (which I have to test yet). In the second case, xrdp could detect the pause by observing that it can read nothing out of the socket for a sufficiently long time. The esound protocol has only three protocol elements that one would need to implement in xrdp: cookie-based authentication, latency request and audio stream playback. Cookie-based authentication is stupid but easy, so should not be a problem. Latency request is actually a good thing, it allows PulseAudio to report to the client how long it would take tor the last-written sample to reach the playback device. Without this request (e.g. with the original custom protocol) or any other way to query or influence the latency, a/v synchronization is impossible. And audio stream playback means just taking audio samples from the socket when they are needed (but not earlier than that). So it should all be quite easily doable. RTP is a unidirectional packet-based protocol. As such, it does not have any way to query the latency. It does not have any useful way to influence the latency at the receiver, either. As such, PulseAudio does not have any means for offering accurate latency reports, and a/v synchronization is impossible. The RTP protocol elements that are not repeated between packets, besides the actual audio data, are the packet sequence number and the timestamp. In the xrdp case the sequence number is probably not interesting, as it just increases for each packet by one. It can be useful for packet loss detection, but packets are not lost in a unix-domain socket if they are read out of the socket in a timely manner. The timestamp starts from 0 and is incremented by 1 for each audio sample. It is useful for reconstructing the exact duration of silence represented by not transmitting any packets. Its relation to the wall clock is conveyed in the SDP announced via the SAP port, by means of the NTP-style timestamp of the start of the transmission, with one-second precision. So this is not useful for determining when exactly, according to the wall clock, this packet should be played. Based on the above, I think that among the three protocols discussed, the esound protocol, if any (this is important!), is the way to go. -- Alexander E. Patrakov