[PATCH 00/13] loopback: Make module-loopback honor requested latency (v5)

patrakov@xxxxxxxxx (Alexander E. Patrakov) · Mon, 16 Nov 2015 02:08:37 +0500

25.02.2015 23:43, Georg Chini wrote:
> Hello,
>
> this is the split version of the module-loopback patch. The patch optimizes the latency
> initialization and regulation of the module.
> Many thanks to Alexander Patrakov for splitting and reviewing the patch and also for
> his contribution to the code.
>
> Georg Chini (13):
>    loopback: Fix the obviously-wrong "buffer+=buffer" logic
>    loopback: Save the timestamps when we snapshot latency
>    loopback: Improved estimation of latency
>    loopback: Adjust rates based on latency difference
>    loopback: Don't track max_request and min_memblockq_length
>    loopback: Restart the timer right away
>    loopback: Refactor latency initialization
>    loopback: Track underruns and cant-peek events
>    loopback: Track the amount of jitter
>    loopback: Added a deadband to reduce rate hunting
>    loopback: Don't change rate abruptly
>    loopback: Validate the rate parameter
>    loopback: add parameter buffer_latency_msec
>
>   src/modules/module-loopback.c | 578 ++++++++++++++++++++++++++++++++----------
>   src/pulse/sample.c            |   5 +-
>   2 files changed, 443 insertions(+), 140 deletions(-)
>

I have tested this code again. Here is my opinion.

It would be logical enough to consider the state of the original 
loopback module, and compare the state of affairs after 04/13 and after 
13/13 to it. And compare to other implementations of adaptive resamplers 
as well.

To test the thing, I used my laptop (that has an internal sound card) 
and my desktop (that has an internal PCH soundcard, also I connected my 
Rotel RA-1570 amplifier via USB).

On the laptop, I created a 5-minute wav file containing a sine wave with 
1 kHz frequency. I connected the laptop headphone output to the desktop 
PCH line input, and used the loopback module to transfer audio from PCH 
to the Rotel amplifier. In all cases, the nominal sample rate 44100 Hz 
was used. In all cases, the sound sent to the Rotel amplifier was 
recorded into a wav file, and the resulting file was analysed with a 
Python script.

I have tried two techniques for instrumenting the loopback module.

First, I thought that I can record the sound sent to the Rotel sound 
card by recording from PulseAudio monitor source. However, this turned 
out to be a bad idea: the existing loopback module behaved differently 
(according to its log) when its sink was monitored. Namely, it 
immediately converged to 44100 Hz - something that it could do only 
after two minutes without such monitoring.

So, I created this snippet in .asoundrc:

pcm.savefile {
     type file
     format "wav"
     slave.pcm "hw:PCUSB"
     file "/tmp/save.wav"
}

and loaded module-alsa-sink so that PulseAudio saved its output to a file.

The most important result is that the old module, both when monitored 
and when not monitored, does not respect the requested latency in this 
test case. It settles down to something between 300 and 400 ms, even 
though 200 ms was requested. There is no such bug with the new 
implementation.

Now let's talk about the script. It is attached. A mono wav file 
containing some silence, recording of a resampled 1 kHz sine wave, and 
some silence again, should be passed as an argument. The script will try 
to estimate the phase of the sine wave (y = A * sin(omega * t + phi), 
the script tries to estimate phi) at various points in time, and plot 
the result. The plot will generally contain some linear trend (due to 
unsynchronized clocks) and some wobble around that. Then it will plot 
again, with this linear trend eliminated (i.e. will estimate the phase 
again using the correct signal frequency in terms of the sink clock). 
Both results are saved as png files, twice (first the full version, then 
only the second half, to hide the startup process).

I chose this metric because it was used by Fons Adriaensen when 
discussing the merits of zita-ajbridge:

http://kokkinizita.linuxaudio.org/linuxaudio/zita-ajbridge-doc/quickguide.html

Both my plots and the plots on his page show the unwrapped phase in 
degrees. Phase, in our case, is related to the delay. 360 degrees = 1 
millisecond = 34 cm of air. In other words, 1 degree = 2.77 us = 0.94 mm 
of air.

The first result (https://imgur.com/a/twBEk ) is actually without any 
loopback module. Just a recording from the line input. We can clearly 
see that there is some clock skew between the laptop and desktop 
soundcards, and that the clock ratio is even not constant. OK - this is 
a reference how good any loopback module can be.

The second result (https://imgur.com/a/eVahQ ) is with the old module. 
You see that, with it, the rate oscillates wildly and then snaps to 
44100 Hz. However, the final latency is not correct, and it accumulates 
due to clock mismatch between the two sound cards in the desktop. I have 
not waited enough to see what happens next, but the result is clearly 
invalid.

The next result (https://imgur.com/a/Bi2Yz ) is with the new module, as 
of PATCH 04/13. There is some "rate hunting", and it results to some 
noise in phase. It is generally contained between -150 and 200 degrees.

Next, let's see (https://imgur.com/a/yQGWL ) what the deadband achieves. 
Result: it helps against rate hunting, but not much against phase 
oscillations. This is because new_rate is snapped to base_rate, which 
would only be correct if the cards actually shared the common clock. So 
here is what happens: the module runs for some time with 1:1 resampling 
(i.e. with the rate strictly equal to 44100 Hz), and the latency 
difference slowly drifts due to the clock mismatch. When the difference 
becomes bigger than the error of its measurement, the module corrects 
it, and then runs at 44100 Hz for some time again.

However, I'd argue that this phase metric can be improved without the 
deadband and beyond what this deadband-based implementation provides.

First, see (https://imgur.com/a/P5Y0A ) what happens, at PATCH 04/13, if 
we just decrease adjust_time to 1 second. The rate oscillations become 
bigger, but phase oscillations stay of the same order as with the 
default value of adjust_time (10 seconds). So not good.

Then, let's change the rate controller so that it does not attempt to 
correct all of the latency difference in one step. The value of 
adjust_time is still 1s, but let's aim to correct the difference after 
10 steps by a factor of 2.71. For simplicity (and for testing purposes 
only), let's destroy the non-linear part of the controller. Here is the 
code:

static uint32_t rate_controller(
                 uint32_t base_rate,
                 pa_usec_t adjust_time,
                 int32_t latency_difference_usec) {

     uint32_t new_rate;

     new_rate = base_rate * (1.0 + 0.1 * (double)latency_difference_usec 
/ adjust_time);

     return new_rate;
}

As you can see (https://imgur.com/a/eZT8L ), the phase oscillations are 
now much more limited!

I have received a private email from Georg where he proposes to use the 
deadband, but snap to something other than base_rate, e.g. to a result 
of some slow controller. For me, this is equivalent to using this slow 
controller, i.e. a good idea.

Finally, let's compare Georg's code to known competitors.

alsaloop (part of alsa-utils): https://imgur.com/lvZLoli . Could not 
settle after 300 seconds, so I am not uploading any processed plots.

alsa_in (part of jack1): https://imgur.com/a/9PnW3 . Rather big 
oscillations, definitely worse than what's proposed here.

zita-ajbridge: could not run it, because it deadlocks at the start. 
Could be the same as 
https://bugs.launchpad.net/ubuntu/+source/zita-ajbridge/+bug/1368716

Also, let's consider the limitations of what we can achieve with 
different controllers.

First, power-saving proponents will not let the latency be sampled more 
often than once in 100 ms by default. Thus, we can only adjust rates 
once in 100 ms. It is a PulseAudio limitation that rate is an integer. 
If the correct rate is 44100.5 Hz, then a controller can only set it to 
either 44100 or 44101 Hz. In any case, that's 1/88200 seconds per second 
of clock skew. Per our 100 ms period of sampling the latency difference, 
that's 1.13 us. With a 1 kHz test signal (as used in all examples 
above), that's about 0.4 degrees of phase. So, "rate is an integer" is 
not an important limitation - the theoretically achievable resolution is 
only slightly worse than the natural clock instability of consumer 
soundcards.

TL;DR: I'm in favor of merging up to PATCH 05/13, and, if someone else 
reviews patches 06 and 07, would recommend merging up to PATCH 08/13. I 
would recommend waiting two weeks before merging further patches, as the 
deadband stuff is likely to get undone and redone if the rate controller 
gets replaced. And I am going to experiment further with different rate 
controllers on the next weekends.

-- 
Alexander E. Patrakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: analyze-phase.py
Type: text/x-python
Size: 2496 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20151116/589275b3/attachment.py>