Re: mmc: tmio_mmc_core: Locking questions

"Behme Dirk (CM/ESO2)" <dirk.behme@xxxxxxxxxxxx> · Mon, 19 Feb 2024 08:22:07 +0100

On 19.02.2024 07:27, Behme Dirk (CM/ESO2) wrote:
Hi Wolfram,

On 18.02.2024 22:10, Wolfram Sang wrote:
Hi Dirk,

hope you are doing well!

Thanks for doing this analysis and summary! Much appreciated.

Many thanks for answering so fast! :)

Comparing this, the questions are:
=20
* tmio uses irqsave spin locks, while the bcm2835 uses mutex for 
locking.
Why is irqsave spin lock used here?

The spinlock was introduced with 6ff56e0d8e02 ("mmc: tmio_mmc: handle
missing HW interrupts"). This was before my time with Renesas, but the
rule of thumb for chosing spinlocks over mutexes: they are lightweight.
If the critical sections are short and no sleeping is needed within
them, they can save context switching. 

Yes, I agree, the question regarding spin lock vs. mutex was non-sense 
;)  For synchronizing with the interrupt handler we need to use spin 
lock, as (potentially sleeping) mutex in interrupt handler are not allowed.

So I'm totally fine for using spin locks for synchronization :)

Remains the question if we are locking "enough".

Of course, this is not needed for
a timeout handler, but in other code paths this is more interesting.

* In tmio the mmc_request_done() is *outside* the lock, while for 
bcm2835
its *inside* the lock protected section. Why does tmio doesn't lock the
access to mmc_request_done()?

The spinlock is to protect the private data structure, 'host' in our
case. mmc_request_done() doesn't operate on it, only with MMC core
structures 'mmc_host' and 'mmc_request'. AFAICS that doesn't need
protection. The mtk-sd driver has it also outside the critical section.
But: clearing host->mrq looks like it should be *in* the critical 
section.
There is another case in tmio_mmc_set_ios() which also looks suspicious
at least. I'll double check all this tomorrow.

* How does tmio_mmc_reset_work() ensures that the content mrq points 
to a=
nd
passed to mmc_request_done() is still the correct content and hasn't 
chan=
ged
asynchronously in the background (due to being outside the lock)?

Because it copies 'host->mrq' to a local variable 'mrq' within a
critical section:

  236         spin_lock_irqsave(&host->lock, flags);
  237         mrq =3D host->mrq;

and then passes this to mmc_request_done(). It even clears host->mrq
before that to enable asynchronous processing of the next mrq:

  270         host->mrq =3D NULL;
  271         mmc_request_done(host->mmc, mrq);

Of course I'm not an expert on this driver ;)

So I'm not sure where this mrq comes from. In the end we just copy the 
pointer to the mrq to our local variable. What I don't know if anybody 
else holds a reference to it and might change/discard this mrq in 
parallel/in the background? Or create a new mrq making the one we are 
currently processing invalid? Just some thoughts, though.

What I think I wanted to say ;)

Would it be a problem that both work queues, tmio_mmc_reset_work() and 
tmio_mmc_done_work() might operate on mrq the same time without 
synchronization?

Simplified:

INIT_DELAYED_WORK(&_host->delayed_reset_work, tmio_mmc_reset_work);
static void tmio_mmc_reset_work(struct work_struct *work)
{
	struct mmc_request *mrq;

	spin_lock_irqsave(&host->lock, flags);
	mrq = host->mrq;
	spin_unlock_irqrestore(&host->lock, flags);

	mmc_request_done(host->mmc, mrq);
}

INIT_WORK(&_host->done, tmio_mmc_done_work);
static void tmio_mmc_done_work(struct work_struct *work)
{
	struct mmc_request *mrq;

	spin_lock_irqsave(&host->lock, flags);
	mrq = host->mrq;
	spin_unlock_irqrestore(&host->lock, flags);

	if (host->mrq && !mrq->cmd->error) {
		tmio_process_mrq(host, mrq);
		return;
	}

	if (host->fixup_request)
		host->fixup_request(host, mrq);

	mmc_request_done(host->mmc, mrq);
}

Best regards

Dirk