alsa sink latency - how to account for startup delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2016-03-30 at 22:31 +0200, Georg Chini wrote:
> On 30.03.2016 18:06, Tanu Kaskinen wrote:
> > 
> > On Tue, 2016-03-29 at 20:29 +0200, Georg Chini wrote:
> > > OK, I thought about it another time and I believe I figured out what the
> > > problem is.
> > > 
> > > (Sometimes I feel like I am talking Chinese to you - I seem not to be
> > > able to express
> > > myself in a way that others can understand, sorry for that. I'll try my
> > > best now.
> > > If we would sit in one room it would probably be sorted out in half an
> > > hour.)
> > > 
> > > The starting point is the observation, that with module-loopback and an
> > > USB sink
> > > the real end-to-end latency is about 10ms longer than configured. This
> > > cannot be
> > > observed with a HDA sink. Let's go through the sequence of events when
> > > module-loopback is started:
> > I'm interested in what delay values alsa reports, so I'll write what I
> > expect the delay to be after each step. One unclear thing is that can
> > the driver report the delay at higher granularity than one URB (= 5 ms
> > in this example)? I'll assume here that it can report the latency with
> > one sample accuracy.
> I had one big mistake in my thinking - it does not matter at all
> when the source starts delivering samples to module-loopback.
> The samples that the source provides are buffered in the memblockq
> and in the end module-loopback adjusts the memblockq so that
> sink latency + source latency + buffer = configured loopback latency.
> That means that even in your previous example where you start with
> 8 seconds initial latency, the controller in module-loopback would
> correct it after some time.
> The latency that is made up by the contents of the memblockq
> (and the render memblockq's on the sink input/source output side)
> can be exactly calculated, so if the resulting overall latency is wrong,
> the error must be on the source or sink side. In this case it must be
> the sink side because changing the source has no impact.
> Nevertheless the sequence below is interesting, because it shows,
> how the latency is calculated on the alsa side. I'll attach a log with
> some debugging information, numbers without units are usec.

I thought that the loopback latency got eventually right, and you only
had trouble in the beginning. Now it sounds like the latency is wrong
all the time.

> > I'll also assume that audio is moved from the ring buffer to the URBs
> > atomically, so the ring buffer fill level jumps by 5 ms every time a
> > URB is filled.
> > 
> > > 
> > > 1) The module sets up a buffer and fills it with silence up to the
> > > configured loopback latency.
> > > Sink and source are configured to 1/3 of the loopback latency. In this
> > > example, we will use
> > > 30ms as loopback latency, so that sink and source are configured to 10ms.
> > I suppose prepare() has not been called in the sink yet, so currently
> > the delay is undefined.
> > 
> > > 
> > > 2) Source and sink are started. For simplicity we assume that the source
> > > starts delivering
> > > samples immediately.
> > I guess the sink still hasn't called prepare()?
> Somewhere here the sink calls prepare.
> > 
> > 
> > > 
> > > 3) As long as sink_input_pop() has not been called, module-loopback
> > > drops the audio
> > > coming in from the source.
> > > 
> > > 4) On the alsa side, the preparation has been done. Again, to simplify
> > > we assume 2 URB's
> > > with 5ms each have been set up and submitted to the USB bus. The first
> > > URB starts
> > > playing silence to the speakers now.
> > prepare() has been called. If we assume that the DAC has not yet
> > consumed anything from the first URB, the "USB bus" latency is 10 ms.
> > PulseAudio hasn't written anything yet to the ring buffer. Expected
> > delay: 10 ms.
> I think the delay is still undefined here. No audio has been sent
> and the stream is not even started.

Yes, you're right, the delay is undefined when the stream hasn't been
started. If we can assume that snd_pcm_start() takes an insignificant
time, then talking about the delay can still make sense, though, if we
define delay as the latency for data that is written now if
snd_pcm_start() is also called now.

> > > 5) The pulseaudio sink code takes the first 10ms of audio out of the
> > > loopback buffer,
> > > writes it to the alsa buffer and calls snd_pcm_start().
> > If the sink takes something from the loopback buffer, this means that
> > the first pop() call has been made. Assuming no time has passed since
> > the previous step, the USB bus is still full, and so is the ring
> > buffer. Expected delay: 20 ms.
> Reported delay is exactly the amount of audio that was written to
> the buffer.

That's the bug that I think should be fixed in alsa if possible (and if
it's impossible, I don't see how it could be fixed in pulseaudio
either). From your explanations, my mental model of the usb sound
driver is that it cycles a constant amount of URBs (packets) that each
carry a fixed amount of audio. The driver tries to keep all URBs queued
at all times, meaning that whenever the DAC releases a URB and the
notification about that reaches the driver, the driver immediately
fills the URB from the ring buffer and puts it back to the queue.

When reporting the delay, ideally the driver would be able to ask the
hardware how many URBs are queued before the DAC and the exact read
position in the current URB, but if that's not possible, at least the
driver could check how many URBs it has sent to the DAC that have not
yet been released. The reported delay should contain the audio in the
queued URBs and in the ring buffer, but at least in the beginning the
driver seems to only report the latency contributed by the ring buffer.

> > > 6) Now sink_input_pop() is called, and that is where things go wrong,
> > > because the
> > > audio from the source is buffered from now on.
> > Is this already the second pop() call? That means that at least one URB
> > has been filled from the ring buffer, otherwise pop() wouldn't have
> > been called due to the ring buffer being full.
> No, the order was wrong, I meant the first pop() callback. But
> this does not matter anyway - see above.
> 
> > During the pop() call the ring buffer contains 0 ms of audio, if two
> > URBs have been filled from the ring buffer, or 5 ms, if only one has
> > been filled.
> > 
> > Expected delay during the pop() call: 0 or 5 ms + whatever the DAC
> > hasn't yet consumed from the USB bus. Expected delay after the sink has
> > finished rendering and writing to the ring buffer: 10 ms + whatever the
> > DAC hasn't yet consumed from the USB bus.
> Since I got it wrong and we are still at the first pop() call, the
> delay is still 10ms.
> 
> > > 7) On the alsa side nothing has happened yet. The reported delay is what
> > > is in the alsa
> > > buffer - 10ms. Alsa has to wait for the first URB to expire before any
> > > sample can be
> > > submitted to the bus.
> > How can nothing have happened yet on the alsa side? pop() has been
> > called twice already, which means that the alsa driver has already
> > consumed audio from the ring buffer. I'd count that as something
> > happening.
> See above - my mistake. But it looks like pop() is really called
> a second time before the delay starts going down - you can see
> in the attached log, that first there are two writes to the alsa
> buffer - one to fill up to the configured sink latency and then
> another one that shouldn't be there. I could not figure out
> why this happens. The code looks fine and you already checked
> yourself that this shouldn't happen. The second write to the
> alsa buffer increases the delay to 19ms, which is a lot more
> than the configured 13 ms, but the 6 ms difference still
> not account for what I measure. Without using my "startup delay"
> I am about 15 ms off.

You could check what the ring buffer fill level is when the second
write is made. You said that the second write "shouldn't be there", but
that's only the case if pulseaudio fills the ring buffer beyond 13 ms,
and if so, there's a bug in pulseaudio that needs fixing. The second
write is entirely valid if the driver has already consumed data from
the ring buffer, which would indicate that the driver takes also
something else (but not everything?) into account when reporting the
delay.

> > If the first URB hasn't yet expired (ignoring the conflicting
> > information that pop() has been called twice already), it doesn't make
> > sense that alsa would report 10 ms delay. The two submitted URBs add 10
> > ms to the delay, minus everything that the DAC has consumed at this
> > point, which I assume to be very little. On top of that there's 10 ms
> > of audio written to the ring buffer. If alsa reports the full delay to
> > be 10 ms, that's wrong, don't you think? It should report 20 ms (or
> > something close to that to account for whatever the DAC has consumed).
> No, it doesn't. It reports exactly the delay that is made up of the
> audio written to the alsa buffer and nothing more.
> 
> > 
> > 
> > > 
> > > 8) Now the first URB has retired and can be filled with audio. After the
> > > URB is submitted to
> > > the bus, the alsa delay is NOT updated and still shows 10ms. The second
> > > URB starts playing
> > > silence for 5ms. Because the buffer seems filled to the correct level,
> > > the pulseaudio sink
> > > code will wait until the fill level gets lower.
> > When the first URB is retired and filled with audio from the ring
> > buffer, that's when the ring buffer fill level jumps from 10 ms to 5
> > ms. You seem to be saying that after this has happened, the alsa sink
> > doesn't fill up the ring buffer. Why is that? 5 ms is half of the full
> > buffer, and that should be low enough to trigger a refill.
> Maybe the ring buffer fill level changes at that point - I did not check.
> That could be the reason for the refill I mentioned above and also
> the reason why there is no underrun later.
> 
> > 
> > 
> > > 
> > > 9) The second URB has retired, is filled with audio and submitted to the
> > > bus. Still the
> > > delay is NOT updated, but from now on, it is correct, since a sample
> > > that would now be
> > > submitted to the alsa buffer would take 10 ms to the speakers. The first
> > > real audio starts
> > > playing.
> > Here you seem to be describing a situation where the ring buffer is
> > empty, and indeed, in that situation the reported delay should be 10
> > ms. I just don't understand why the ring buffer would be empty (that
> > is, why didn't pulseaudio refill the ring buffer after the first URB
> > was consumed from the ring buffer).
> It looks like it did - see above. But still the delay shows the same
> value.
> 
> > 
> > 
> > > 
> > > 10) Meanwhile the source has been producing samples that have been queued in
> > > the loopback buffer and that make up the extra latency I have been
> > > observing. In
> > > this example it would be 5ms from the second URB plus the time that was left
> > > from the first URB when snd_pcm_start() was called.
> > > 
> > > 11) The first URB-full of audio has been played and the URB retires. Now
> > > the alsa
> > > delay is adjusted to reflect that the first 5ms have been played and the
> > > audio flow
> > > is completely set up.
> > If the ring buffer was empty in the previous step, and we didn't write
> > anything after that, then the retirement of an URB now would cause an
> > underrun, since the retired URB can't be refilled from the ring buffer.
> Yes that was also bothering me a while after I wrote this mail. But
> it looks like the explanation is that the alsa driver does change the
> ring buffer fill level, but not the delay.
> 
> > 
> > 
> > > 
> > > If the first sink_input_pop() would be called at this point, everything
> > > would be fine.
> > > So the problem seems to be that sink_input_pop() is called too early.
> > I don't know, your description contained many strange parts. I don't
> > know what is really happening. But if module-loopback heavily relies on
> > the configured latency to be the same as the real latency, then that's
> > going to cause trouble anyway, because they just aren't the same thing,
> > even if the alsa driver is reporting everything perfectly.
> > 
> Module loopback relies on the reported latencies, the configured
> latencies are only used during setup. If the reported latencies
> don't match the configured it does not matter.
> 
> The attached logs are with my own smoother code, but I have
> changed nothing else in the alsa sink, so the timing is the same.
> BTW, do you think the debug output of module-loopback is
> better now?

Yes, although if it's logged twice a second, it might be better to
print the status only if explicitly requested via a module argument.

-- 
Tanu


[Index of Archives]     [Linux Audio Users]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux