Re: [PATCH 3/3] echoaudio: Address bugs in the interrupt handling

Mark Hills <mark@xxxxxxxx> · Thu, 18 Jun 2020 14:22:24 +0100 (BST)

On Thu, 18 Jun 2020, Mark Hills wrote:

> On Thu, 18 Jun 2020, Takashi Iwai wrote:
> 
> > On Thu, 18 Jun 2020 13:07:33 +0200,
> > Mark Hills wrote:
> > > 
> > > On Thu, 18 Jun 2020, Takashi Iwai wrote:
> > > 
> > > > On Wed, 17 Jun 2020 12:51:05 +0200,
> > > > Mark Hills wrote:
> > > [...]
> > > But could I please ask for help with the bigger picture? As it feels 
> > > mismatched.
> > > 
> > > * Code should take every possible opportunity to update the stream 
> > >   position; interrupts, or explicit pcm_pointer calls (whereas the docs 
> > >   guide towards doing it in the interrupt handler)
> > > 
> > > * I critiqued (elsewhere in thread) the older interrupt handler for 
> > >   checking the counter, unlocking, calling back into PCM core and checking 
> > >   again a moment later. Whereas this is considered good behaviour.
> > > 
> > > * Seems like the overall aim is for userland to be able (if it wants to)  
> > >   to poll the soundcard, even outside of periods.
> > 
> > Right, the user-space can query the current position at any time, and
> > the driver should return the position as precisely as possible.
> > 
> > Some applications (like PulseAudio) sets the interrupt as minimum as
> > possible while it does schedule the update by itself, judging the
> > position via the ioctl.  In such operational mode, the accuracy of the
> > current position query is vital.
> > 
> > > If all the above is true, I would expect interrupt handling to be very 
> > > simple -- it would straight away call into PCM core, join existing if the 
> > > codepaths (to take locks) and do a position update. PCM core would decide 
> > > if a period really elapsed, not the driver. But this is not how it works.
> > > 
> > > This now relates strongly to a question of locking:
> > > 
> > > I ran the code (top of this message) all day, with a few instances in 
> > > dmesg (at irregular intervals, not wrapping):
> > > 
> > >   [161644.076666] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [163232.392501] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [164976.098069] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [165054.946225] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > >   [165312.141545] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
> > > 
> > > A definite bug, of course.
> > > 
> > > However (and I am happy to be corrected) the function never finishes with 
> > > position == buffer size. Only way is a race between interrupt handler and 
> > > pcm_pointer.
> > > 
> > > Therefore one of these is needed:
> > > 
> > > * pcm_pointer locks chip->lock
> > > 
> > >   Even though the docs emphasise PCM core has exclusivity, it it not worth 
> > >   much as it does not protect against the interrupt handler.
> > > 
> > >   But now interrupt handler becomes ugly in total: take chip->lock, check 
> > >   the counter, release chip->lock, go to PCM core (which takes middle 
> > >   layer locks), take chip->lock again, check counter again, release 
> > >   chip->lock again.
> > 
> > Yes, that's the most robust way to go.  If the lock is really costing
> > too much, you can consider a fast-path by some flag for the irq ->
> > snd_pcm_period_elapsed() case.
> 
> I don't understand how a fast path could be made to work, as it can't pass 
> data across snd_pcm_period_elapsed() and it still must syncronise access 
> between reading dma_counter and writing pipe->position.
> 
> Hence questioning if a better design is simpler interrupt handlers that 
> just enter PCM core.
>  
> > Basically you can do everything in the pointer callback.  The only 
> > requirement in the interrupt handle is basically judge whether you need 
> > the call of snd_pcm_period_elapsed() and call it.  The rest update could 
> > be done in the other places.
> 
> Thanks, please just another clarification:
> 
> I presume that calls to pcm_pointer are completely independent of the 
> period notifications?
> 
> ie. period notifications are at regular intervals, regardless of whether 
> pcm_pointer is called inbetween. pcm_pointer must not reset any state used 
> by the interrupt.
> 
> Which means we must handle when non-interrupt call to pcm_pointer causes a 
> period to elapse. The next interrupt handler must notify.
> 
> I can see in the original code uses chip->last_period exclusively by the 
> interrupt handler, and I removed it. Some comments around the intent would 
> help. I'll cross reference the original code with my new understanding.
> 
> My instinct here is that to preserve
> 
> - regular period notifications
> 
> - handle period_size not aligning with 32-bit counter
> 
> - no races between interrupt and pcm_pointer
> 
> that the clearest and bug-free implementation may be to separate the 
> interrupt (notifications) and pointer updates to separate state.
> 
> Then there's no lock and the only crossover is an atomic read of 
> dma_counter.
> 
> And that's what I will try -- thanks.

Ok so the implementation would look something like this below,
which I will run for the rest of the day:

* Clear separation of the period notification from position updates; only 
  syncronising is around dma_counter, no need for locks

* All counting is accumulated to avoid bugs in the cases of wrapping
  and non-alignment

It's easier to see in the end results but of course I'll work on a clear 
diff.

---

/******************************************************************************
	IRQ Handling
******************************************************************************/
/* Check if a period has elapsed since last interrupt
 *
 * Don't make any updates to state; PCM core handles this with the
 * correct locks.
 *
 * \return true if a period has elapsed, otherwise false
 */
static bool period_has_elapsed(struct snd_pcm_substream *substream)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	struct audiopipe *pipe = runtime->private_data;
	u32 counter, step;
	size_t period_bytes;

	if (pipe->state != PIPE_STATE_STARTED)
		return false;

	period_bytes = frames_to_bytes(runtime, runtime->period_size);

	counter = le32_to_cpu(*pipe->dma_counter);  /* presumed atomic */

	step = counter - pipe->last_period;  /* handles wrapping */
	step -= step % period_bytes;  /* acknowledge whole periods only */

	if (step == 0)
		return false;  /* haven't advanced a whole period yet */

	pipe->last_period += step;  /* used exclusively by us */
	return true;
}

static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
{
	struct echoaudio *chip = dev_id;
	int ss, st;

	spin_lock(&chip->lock);
	st = service_irq(chip);
	if (st < 0) {
		spin_unlock(&chip->lock);
		return IRQ_NONE;
	}
	/* The hardware doesn't tell us which substream caused the irq,
	thus we have to check all running substreams. */
	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
		struct snd_pcm_substream *substream;

		substream = chip->substream[ss];
		if (substream && period_has_elapsed(substream)) {
			spin_unlock(&chip->lock);
			snd_pcm_period_elapsed(substream);
			spin_lock(&chip->lock);
		}
	}
	spin_unlock(&chip->lock);

#ifdef ECHOCARD_HAS_MIDI
	if (st > 0 && chip->midi_in) {
		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
	}
#endif
	return IRQ_HANDLED;
}

static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
{
	struct snd_pcm_runtime *runtime = substream->runtime;
	struct audiopipe *pipe = runtime->private_data;
	u32 counter, step;

	/*
	 * IRQ handling runs concurrently. Do not share tracking of
	 * counter with it, which would race or require locking
	 */

	counter = le32_to_cpu(*pipe->dma_counter);  /* presumed atomic */

	step = counter - pipe->last_counter;  /* handles wrapping */
	pipe->last_counter = counter;

	/* counter doesn't neccessarily wrap on a multiple of
	 * buffer_size, so can't derive the position; must
	 * accumulate */

	pipe->position += step;
	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size); /* wrap */

	return bytes_to_frames(runtime, pipe->position);
}

-- 
Mark