31.05.2014 02:05, I wrote: > The claimed deficiencies of the esound sink are high latency and even > worse latency estimation, i.e. a/v sync issues. However, there is > something strange (possible bug, found by code inspection, I have not > tested anything yet) in module-esound-sink.c. It creates a socket, > relies on the socket buffer becoming full for throttling the sink > rendering process, but never sets the SO_SNDBUF option, either directly > or through the helper from pulsecore/socket-util.c. And the default is > more than 256 KB! So no wonder that the socket accumulates a lot of > sound data (and thus latency) before throttling. > > As for the bad latency estimation, I think this applies only to > networked connections. Indeed, the esound protocol has a request for > querying the server-internal latency, and PulseAudio issues it. The > total latency consists of the amount of the sound data buffered in the > esound server, the network, and locally in the client. The only unknown > here is the network: the server-internal latency can be queried, and the > amount of locally-buffered data is known via SIOCOUTQ. But for local > connections, the amount of data buffered by the network is zero, so this > criticism also seems unfounded in the XRDP case. Yesterday and today I played with sockets and also with the real esd, and here is the degree to which the criticisms above are valid. Summary: even if xrdp implements every aspect of the esound protocol perfectly, we won't be able to get latency below 25 ms (4480 bytes) for CD-format (44100 Hz, 16 bits, stereo) samples, and that would require, at the PulseAudio side, to work around a server-side bug of the real esd. As the original patch submission effectively stated, by its code, that the 30 ms latency is good enough, I guess that the 25 ms limitation is not a showstopper for CD-format samples. But the 4480-byte latency can be somewhat problematic for lower-quality formats. The esound protocol, as I have already said, relies on the socket buffers becoming full as the means of synchronization. This means that the minimum achievable latency is directly related to the minimum socket buffer size. If I set the buffer to 1 byte, the kernel bumps it to the real minimum: SO_RCVBUF -> 2304 SO_SNDBUF -> 4608 OK. So let's create a unix-domain socket, bind it to /tmp/demo.sock, set these buffer sizes, accept a connection, and don't read anything. It is expected that the client will be able to write some limited amount of data to the socket before it gets blocking. This is very easy to measure by making the client socket non-blocking and writing data there. In my experiment, with the minimal buffer sizes both on the client and on the server, I was able to write 4480 bytes there. I am not able to relate this to the numbers above - but maybe I shouldn't. In any case, this number (4480 bytes) determines the minimum latency achievable in any setup that relies on blocking when the unix-domain socket buffer becomes full. For typical CD-format samples, this means that the theoretical minimum latency is 25.4 ms. Then, let's see how PulseAudio's estimation of the queue length works here. It uses the SIOCOUTQ ioctl, and in my case, it returns 8704. Which is nonsense (in other words, kernel bug), especially since the other end can receive only 4480 bytes. Just for fun, I have repeated this test using regular TCP sockets over a wi-fi link. The minimum buffer sizes are the same. I was able to send 1152 bytes and then 1152 bytes more before getting EAGAIN. At that point, SIOCOUTQ said that 1152 bytes are buffered locally. Well, that's more sane than in the unix-domain-socket case (it can be interpreted as "1152 bytes are buffered locally and 1152 bytes must be buffered remotely", which matches the traffic dump, 1152 being the TCP window size), but still fails to account for the remote buffer, and I don't know how to explain this value in terms of SO_{SND,RCV}BUF and manual pages. With a bigger SO_SNDBUF value, both in the TCP and in the unix-domain case, I am able to "send" more before the socket gets blocked. In the TCP case, SIOCOUTQ correctly indicates that the bytes get actually queued locally. In the unix-domain socket case, its result also increases, but (with the minimal buffers on the receiving side) remains off by approximately 4k bytes from what I would expect. Unfortunately, we can't just set the send buffer size to the minimum, because that would break communication with the real esd. The problem is in its read_player() function: if (actual < player->buffer_length - player->actual_length) break; I.e., on any partial read (which is going to happen if the sender uses a small buffer), the contents are just thrown out and not mixed. The typical read size is 4096 bytes, but can in pathological situations (OSS on a bad card) be up to 86016 bytes. By the way, jesd-0.0.7 (from 2000) does not have this bug. To work around the bug, we need to use pa_sink_render_full(), so that the data is written using as few packets as possible, and a compatible send buffer size. The minimum buffer size that doesn't trigger the bug can be estimated from the latency report provided by esd. Also, we can omit the workaround for unix-domain sockets, as nobody is going to run the real esd on the same local machine as PulseAudio. -- Alexander E. Patrakov