[PATCH] add xrdp sink

patrakov@xxxxxxxxx (Alexander E. Patrakov) · Sun, 01 Jun 2014 20:28:57 +0600

31.05.2014 02:05, I wrote:
> The claimed deficiencies of the esound sink are high latency and even
> worse latency estimation, i.e. a/v sync issues. However, there is
> something strange (possible bug, found by code inspection, I have not
> tested anything yet) in module-esound-sink.c. It creates a socket,
> relies on the socket buffer becoming full for throttling the sink
> rendering process, but never sets the SO_SNDBUF option, either directly
> or through the helper from pulsecore/socket-util.c. And the default is
> more than 256 KB! So no wonder that the socket accumulates a lot of
> sound data (and thus latency) before throttling.
>
> As for the bad latency estimation, I think this applies only to
> networked connections. Indeed, the esound protocol has a request for
> querying the server-internal latency, and PulseAudio issues it. The
> total latency consists of the amount of the sound data buffered in the
> esound server, the network, and locally in the client. The only unknown
> here is the network: the server-internal latency can be queried, and the
> amount of locally-buffered data is known via SIOCOUTQ. But for local
> connections, the amount of data buffered by the network is zero, so this
> criticism also seems unfounded in the XRDP case.

Yesterday and today I played with sockets and also with the real esd, 
and here is the degree to which the criticisms above are valid. Summary: 
even if xrdp implements every aspect of the esound protocol perfectly, 
we won't be able to get latency below 25 ms (4480 bytes) for CD-format 
(44100 Hz, 16 bits, stereo) samples, and that would require, at the 
PulseAudio side, to work around a server-side bug of the real esd. As 
the original patch submission effectively stated, by its code, that the 
30 ms latency is good enough, I guess that the 25 ms limitation is not a 
showstopper for CD-format samples. But the 4480-byte latency can be 
somewhat problematic for lower-quality formats.

The esound protocol, as I have already said, relies on the socket 
buffers becoming full as the means of synchronization. This means that 
the minimum achievable latency is directly related to the minimum socket 
buffer size. If I set the buffer to 1 byte, the kernel bumps it to the 
real minimum:

SO_RCVBUF -> 2304
SO_SNDBUF -> 4608

OK. So let's create a unix-domain socket, bind it to /tmp/demo.sock, set 
these buffer sizes, accept a connection, and don't read anything. It is 
expected that the client will be able to write some limited amount of 
data to the socket before it gets blocking. This is very easy to measure 
by making the client socket non-blocking and writing data there.

In my experiment, with the minimal buffer sizes both on the client and 
on the server, I was able to write 4480 bytes there. I am not able to 
relate this to the numbers above - but maybe I shouldn't. In any case, 
this number (4480 bytes) determines the minimum latency achievable in 
any setup that relies on blocking when the unix-domain socket buffer 
becomes full. For typical CD-format samples, this means that the 
theoretical minimum latency is 25.4 ms.

Then, let's see how PulseAudio's estimation of the queue length works 
here. It uses the SIOCOUTQ ioctl, and in my case, it returns 8704. Which 
is nonsense (in other words, kernel bug), especially since the other end 
can receive only 4480 bytes.

Just for fun, I have repeated this test using regular TCP sockets over a 
wi-fi link. The minimum buffer sizes are the same. I was able to send 
1152 bytes and then 1152 bytes more before getting EAGAIN. At that 
point,  SIOCOUTQ said that 1152 bytes are buffered locally. Well, that's 
more sane than in the unix-domain-socket case (it can be interpreted as 
"1152 bytes are buffered locally and 1152 bytes must be buffered 
remotely", which matches the traffic dump, 1152 being the TCP window 
size), but still fails to account for the remote buffer, and I don't 
know how to explain this value in terms of SO_{SND,RCV}BUF and manual pages.

With a bigger SO_SNDBUF value, both in the TCP and in the unix-domain 
case, I am able to "send" more before the socket gets blocked. In the 
TCP case, SIOCOUTQ correctly indicates that the bytes get actually 
queued locally. In the unix-domain socket case, its result also 
increases, but (with the minimal buffers on the receiving side) remains 
off by approximately 4k bytes from what I would expect.

Unfortunately, we can't just set the send buffer size to the minimum, 
because that would break communication with the real esd. The problem is 
in its read_player() function:

        if (actual < player->buffer_length - player->actual_length)
                 break;

I.e., on any partial read (which is going to happen if the sender uses a 
small buffer), the contents are just thrown out and not mixed. The 
typical read size is 4096 bytes, but can in pathological situations (OSS 
on a bad card) be up to 86016 bytes. By the way, jesd-0.0.7 (from 2000) 
does not have this bug. To work around the bug, we need to use 
pa_sink_render_full(), so that the data is written using as few packets 
as possible, and a compatible send buffer size.

The minimum buffer size that doesn't trigger the bug can be estimated 
from the latency report provided by esd. Also, we can omit the 
workaround for unix-domain sockets, as nobody is going to run the real 
esd on the same local machine as PulseAudio.

-- 
Alexander E. Patrakov