alsa sink latency - how to account for startup delay

georg@xxxxxxxx (Georg Chini) · Tue, 5 Apr 2016 21:54:24 +0200

On 05.04.2016 17:42, Tanu Kaskinen wrote:
> On Sat, 2016-04-02 at 18:13 +0200, Georg Chini wrote:
>> On 02.04.2016 16:10, Tanu Kaskinen wrote:
>>> On Wed, 2016-03-30 at 22:31 +0200, Georg Chini wrote:
>>>> On 30.03.2016 18:06, Tanu Kaskinen wrote:
>>>>> On Tue, 2016-03-29 at 20:29 +0200, Georg Chini wrote:
>>>>>> 5) The pulseaudio sink code takes the first 10ms of audio out of the
>>>>>> loopback buffer,
>>>>>> writes it to the alsa buffer and calls snd_pcm_start().
>>>>> If the sink takes something from the loopback buffer, this means that
>>>>> the first pop() call has been made. Assuming no time has passed since
>>>>> the previous step, the USB bus is still full, and so is the ring
>>>>> buffer. Expected delay: 20 ms.
>>>> Reported delay is exactly the amount of audio that was written to
>>>> the buffer.
>>> That's the bug that I think should be fixed in alsa if possible (and if
>>> it's impossible, I don't see how it could be fixed in pulseaudio
>>> either).
>> It can be fixed (or at least be worked around). If you take a time stamp
>> at the moment when snd_pcm_start() is called and another when
>> the first audio has definitely been played (delay < write_count), then
>> the difference between the time stamps corrected by the amount
>> of audio that has already been played, gives you exactly that
>> missing bit of latency.
> I can't follow that line of reasoning. In the beginning the ring buffer
> is filled to max, and once you call snd_pcm_start(), data starts to
> move from the ring buffer to other buffers (I'll call the other buffers
> the "not-ring-buffer"). Apparently the driver "sees" the not-ring-
> buffer only partially, since it reports a larger latency than just the
> ring buffer fill level, but it still doesn't report the full latency.
> The time between snd_pcm_start() and the point where the reported delay
> does not any more equal the written amount tells the size of the
> visible part of the not-ring-buffer - it's the time it took for the
> first sample to travel from the ring buffer to the invisible part of
> the not-ring-buffer. I don't understand how the time could say anything
> about the size of the invisible part of the not-ring-buffer. Your logic
> "works" only if the visible and invisible parts happen to be of the
> same size.
>
> You should get the same results by calculating
>
>    adjusted delay = ring buffer fill level + 2 * (reported delay - ring buffer fill level)
>
> That formula doesn't make sense, but that's how I understand your logic
> works, with the difference that your fix is based on one measurement
> only, so it's constant over time, while my formula recalculates the
> adjustment every time the delay is queried, so the adjustment size
> varies somewhat depending on the granularity at which audio moves to
> and from the visible part of the not-ring-buffer.
>
> In any case, even if your logic actually makes sense and I'm just
> misunderstanding something, I don't see why the correction should be
> done in pulseaudio instead of the alsa driver.

Well, now I don't understand what you mean. The logic is very simple:
If there is a not reported delay between the time snd_pcm_start() is
called and the time when the first sample is delivered to the DAC, then
this delay will persist and become part of the continuous latency.
That's all, what causes the delay is completely irrelevant.

Maybe what I said above was not complete. At the point in time when
the first audio is played, there are two delays: First the one that is 
reported
by alsa and the other is the difference between the time stamps minus
the played audio. If these two delays don't match, then there is an
"extra delay" that has to be taken into account.
Trying to fix up that delay on every iteration does not make any sense
at all, it is there from the start and it is constant.

You are actually right - the problem should be fixed in the alsa driver,
but that should not hinder pulseaudio to work around the issue for
multiple reasons:
1) It is relatively easy to work around
2) If the problem is fixed in the alsa driver, the delay would just be
0 and so have no impact anymore. This means there is no need
to change the code after the bug has been fixed in alsa.
3) If there is another driver out there with the same or a similar bug,
(which I think is not unlikely) pulseaudio would do the right thing and
fix it up.
4) When it is fixed in the alsa driver pulseaudio should still be able to
deliver the right values with a not fixed driver - at least for some time
until old versions of the alsa driver have died out.

I've seen other workarounds for USB in the code - see remarks in
alsa-sink and alsa-source or the fix to enable timer based scheduling.

My smoother code contains in fact one more of those workarounds -
for the latency jump that happens shortly after startup. It's the same
thing - if the bug is fixed in alsa it's good and the workaround will not
be triggered but as long as the bug is there, the workaround minimizes
the impact of it.

>
>> That was what my original question was about - what should I do with
>> this extra latency? Currently I am just adding it as an offset to the
>> "normal" latency. This however means, that if you configure let's say
>> 10ms, you will get in fact around 22ms. (You would get 22ms anyway,
>> but the reports would show 10ms with the old code.)
>> For HDA the reported delay is even slightly negative, probably because
>> the card already starts during the preparation step. Negative delays
>> are truncated by my code, no real audio should have been played
>> before snd_pcm_start().
> A negative delay indicates an underrun according to
> http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___status.html#ga1fdce3985e64f66385a5805da1110f18
This is not a negative delay reported by alsa, but my "extra latency"
is getting negative, which means playback must have started
before snd_pcm_start().
According to Raymond Yau playback seems in fact to be started
before snd_pcm_start() for HDA devices, at least if I read his last
mail on that topic right. Then the negative delays would even make
sense, since data is written to the buffer before snd_pcm_start().

>
>>>> BTW, do you think the debug output of module-loopback is
>>>> better now?
>>> Yes, although if it's logged twice a second, it might be better to
>>> print the status only if explicitly requested via a module argument.
>>>
>> The status is printed once every adjust time and only when debug
>> logging is enabled. 500ms seems to be a good value for the adjust
>> time, it is currently my default. If you prefer an additional argument
>> to enable logging, I can add it.
> Yes, I'd prefer that. Continous logging annoys me when it's about
> something that I'm not currently interested in. module-loopback isn't
> the only thing generating annoying logs, but decreasing the interval
> from 10 seconds to 0.5 seconds makes the problem that much worse.
>
OK, could we do something similar for those "memblock pool full"
messages? Sometimes I get several thousand messages per second
(which are suppressed by the rate limiting) and I cannot see any
relevance.