Re: [PATCH 3/3] ASoC: max98090: fix possible race conditions

Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx> · Wed, 20 Nov 2019 21:20:27 -0600

max98090_interrupt      max98090_pll_work      max98090 codec
----------------------------------------------------------------------
                                                 ULK = 1
receive ULK INT
read 0x01
                                                 ULK = 0 (clear on read)
schedule max98090_pll_work
                          restart max98090 codec
                                                 ULK = 1
receive ULK INT
read 0x01
                                                 ULK = 0 (clear on read)
                          read 0x1
                          assert ULK == 0 (2).

what are those 0x01 and 0x1? is the second a typo possibly?

ACK, a typo.

In the case (2), both max98090_interrupt and max98090_pll_work read
the same clear-on-read register.  max98090_pll_work would falsely
thought PLL is locked.

There are 2 possible options:
A. turn off ULK interrupt before scheduling max98090_pll_work; and turn
on again before exiting max98090_pll_work.
B. remove the second thread of execution.

Adopts option B which is more straightforward.

but has the side effect of possibly adding a 10ms delay in the interrupt
thread?

(forgot to mention) Option A cannot fix the case (2) race condition:
there would be 2 threads read the same clear-on-read register.  In
theory, the hardware should faster than CPUs' accesses via I2C.
max98090 should returns ULK=1 any time if PLL is unlocked.  Shall we
ignore the case (2) and adopt option A?

I don't have any specific recommendation, just trying to follow your 
line of thought on a problem that bugged be in the past. If your tests 
shows that the extra delay is fine, then it's good progress

Still you may want to clarify your second point and the commit message. 
The first race condition was clear, the second one not so much.

what did you mean with 'assert ULK == 0' and what does this map to in 
pll_work

the code looks as this:

static void max98090_pll_work(struct work_struct *work)
{
	if (!snd_soc_component_is_active(component))
		return;

	/* Toggle shutdown OFF then ON */
	snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN,
			    M98090_SHDNN_MASK, 0);

	snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN,
			    M98090_SHDNN_MASK, M98090_SHDNN_MASK);

	/* Give PLL time to lock */
	msleep(10);
}

so in what does the race occur?

or was it race with the other work queue, which starts like this:

static void max98090_pll_det_enable_work(struct work_struct *work)
{
	struct max98090_priv *max98090 =
		container_of(work, struct max98090_priv,
			     pll_det_enable_work.work);
	struct snd_soc_component *component = max98090->component;
	unsigned int status, mask;

	/*
	 * Clear status register in order to clear possibly already occurred
	 * PLL unlock. If PLL hasn't still locked, the status will be set
	 * again and PLL unlock interrupt will occur.
	 * Note this will clear all status bits
	 */
	regmap_read(max98090->regmap, M98090_REG_DEVICE_STATUS, &status);

So here indeed there is the same read, but what is the consequence of 
the read?

did you mean that in the pll_det_enable_work the value of 'status' may 
be incorrect as it was cleared already?

But the interrupt does not schedule the pll_det_enable_work(), it 
happens on a trigger, so how would the race happen?

it'd be good if you provide more details on the flow so that all 
reviewers get the ideas, it's a good opportunity to fix this for good 
and move on.
_______________________________________________
Alsa-devel mailing list
Alsa-devel@xxxxxxxxxxxxxxxx
https://mailman.alsa-project.org/mailman/listinfo/alsa-devel