[PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

tanuk@xxxxxx (Tanu Kaskinen) · Mon, 27 Feb 2017 12:38:14 +0200

On Mon, 2017-02-27 at 08:44 +0100, Georg Chini wrote:
> > > > > > > > > +                /* If the source has overrun, assume that the maximum it should have pushed is
> > > > > > > > > +                 * one full source latency. It may still be possible that the next push also
> > > > > > > > > +                 * contains too much data, then the resulting latency will be wrong. */
> > > > > > > > > +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> > > > > > > > > +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> > > > > > > > > +                else
> > > > > > > > > +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> > > > > > > > 
> > > > > > > > It's unclear to me what's happening here. I guess you're substracting
> > > > > > > > chunk->length from time_delta, because when push_cb was called, the
> > > > > > > > chunk was still included in the source latency report (I'm a bit
> > > > > > > > worried that some sources might not do this), but the chunk has also
> > > > > > > > been pushed to the memblockq, so if we don't adjust time_delta, the
> > > > > > > > chunk will be counted twice, and the memblockq adjustment will be
> > > > > > > > wrong.
> > > > > > > 
> > > > > > > Yes, exactly.
> > > > > > > > But I don't really get why you are concerned with "overruns", and how
> > > > > > > > your code mitigates any problems. If you substract only a part of
> > > > > > > > chunk->length, it means that a part of the chunk will be counted twice,
> > > > > > > > resulting in wrong latency.
> > > > > > > 
> > > > > > > No, it means, that the memblockq will be adjusted to the correct length
> > > > > > > because it is adjusted after the data has been pushed.
> > > > > > 
> > > > > > The correct memblockq length is u->latency - sink latency - source
> > > > > > latency. To me it seems that your adjustment makes us overestimate the
> > > > > > source latency.
> > > > > > 
> > > > > > The latency report that the source gives is (1) the current chunk
> > > > > > length, plus (2) any buffered data that isn't included in the current
> > > > > > chunk. In adjust_memblockq() we're only interested in (2), which is why
> > > > > > we subtract the chunk length from the reported source latency. If the
> > > > > > chunk was huge for some reason, adjust_memblockq() will in any case
> > > > > > drop excess data.
> > > > > > 
> > > > > > > If the source
> > > > > > > has overrun and delivered much more data than expected, the excess
> > > > > > > data will be dropped.
> > > > > > 
> > > > > > If the source latency is overestimated, too much data will be dropped.
> > > > > 
> > > > > You are talking about two different things. The source latency is
> > > > > very different from the first chunk of data that is pushed.
> > > > 
> > > > Yes, they are very different things. But adjust_memblockq() needs to
> > > > know the real source latency, it doesn't care about the size of the
> > > > first chunk. It doesn't matter how big the first chunk is, because
> > > > adjust_memblockq() will in any case adjust the memblockq length so that
> > > > the end-to-end latency will match what is configured.
> > > > 
> > > > > When switching quickly back and forth between two sources, it
> > > > > happens, that a source configured to a few ms of latency
> > > > > suddenly pushes 500 ms of data on the first push. I think this
> > > > > is due to the fact that the source is re-configured to its maximum
> > > > > latency when it goes to idle and when switching back to low latency
> > > > > it somehow does not get rid of the excess data soon enough.
> > > > > Remember, the adjustment is only done on the first push, so it is
> > > > > a one-shot thing just to get the queue right. If I don't drop the
> > > > > data, I end up with those 500 ms too much in the memblockq.
> > > > 
> > > > The memblockq length should always be
> > > > 
> > > >       u->latency - real sink latency - real source latency
> > > > 
> > > > and if that is less than the 500 ms that was just pushed to the queue,
> > > > the extra data will be dropped. I don't see how we can end up with too
> > > > long memblockq if adjust_memblockq() gets accurate sink and source
> > > > latency information.
> > > 
> > > Sorry, but take a look at the code. We CALCULATE how long the memblockq
> > > should be and pass that as a parameter to adjust_memblockq() AND we have
> > > to subtract the first chunk length to get things right.
> > 
> > I don't think I've said anything that conflicts with what you say here.
> > 
> > > So if the first chunk is too big, the memblockq will be too long, as
> > > simple as that.
> > 
> > I don't understand how you jump to this conclusion.
> > 
> > I'll try to explain how I understand the system:
> > 
> > Invariant 1: memblockq length should always be this when the pipeline
> > is fully up and running:
> > 
> >      u->latency - real sink latency - real source latency
> > 
> > When doing the final adjustment, the offset that is passed to
> > adjust_memblockq() should be
> > 
> >      real sink latency + real source latency
> > 
> > Invariant 2: the real source latency at the time of processing the POST
> > message can always be calculated like this:
> > 
> >      latency report - size of the received chunk + time between sending and receiving the POST message
> > 
> > You are saying that the size of the received chunk must be adjusted so
> > that it won't be bigger than effective_source_latency. Do you think
> > invariant 2 is wrong (that is, the formula doesn't always give an
> > accurate measure of the real source latency), or that the invariant 1
> > is wrong (that is, sometimes it makes sense to have a shorter memblockq
> > than given by the formula)?
> 
> Invariant 2 is wrong when the source overruns. The received chunk can be 
> much
> bigger than the latency report and your formula above will produce 
> negative values.
> It is even possible that the source is sending more than one "much too 
> big" chunk,
> so the situation that you end up with far too much data in the queue cannot
> completely be avoided. I did not put the code there for some theoretical 
> reason,
> you can easily reproduce the problem by switching rapidly between two 
> sources
> at low latency.

Ah, now I finally see the problem. To make the code more clear about
this, I would suggest adding a separate variable for the source
latency, and comment the calculation like this, for example:

/* data contains the reported source latency at the time the
 * message was sent. *
source_latency = PA_PTR_TO_UINT(data);

/* The source latency report includes the audio in the chunk,
Â * but since we already pushed the chunk to the memblockq, we need
Â * to subtract the chunk size from the source latency so that it
Â * won't be counted towards both the memblockq latency and the
Â * source latency.
Â *
Â * Sometimes the alsa source reports too small latencies for some
Â * reason. We can't reliably detect when that happens, but one
Â * obvious indicator is when the reported latency is smaller than
Â * the size of the chunk that the source generated. In those case
Â * we estimate the source latency to be zero, which may not be
Â * entirely accurate, but it's in any case more accurate than
Â * the reported latency. */
if (chunk->length <= source_latency)
    source_latency -= chunk->length;
else
    source_latency = 0;

/* offset is a timestamp of the moment when the POST message was
 * sent. It might have taken some time before we got to processing
 * the message, and during that time the source latency has grown. */
source_latency += pa_rtclock_now() - offset;

I thought that the alsa source latency reports would always be directly
based on snd_pcm_delay(), but now that I checked, we seem to be using a
smoother instead. Maybe there's some smoother adjustment missing when
the source's configured latency changes. Or maybe the smoother just
behaves badly.

-- 
Tanu

https://www.patreon.com/tanuk