Re: mmc: tmio_mmc_core: Locking questions

Wolfram Sang <wsa@xxxxxxxxxx> · Sun, 18 Feb 2024 22:10:57 +0100

Hi Dirk,

hope you are doing well!

Thanks for doing this analysis and summary! Much appreciated.

> Comparing this, the questions are:
> 
> * tmio uses irqsave spin locks, while the bcm2835 uses mutex for locking.
> Why is irqsave spin lock used here?

The spinlock was introduced with 6ff56e0d8e02 ("mmc: tmio_mmc: handle
missing HW interrupts"). This was before my time with Renesas, but the
rule of thumb for chosing spinlocks over mutexes: they are lightweight.
If the critical sections are short and no sleeping is needed within
them, they can save context switching. Of course, this is not needed for
a timeout handler, but in other code paths this is more interesting.

> * In tmio the mmc_request_done() is *outside* the lock, while for bcm2835
> its *inside* the lock protected section. Why does tmio doesn't lock the
> access to mmc_request_done()?

The spinlock is to protect the private data structure, 'host' in our
case. mmc_request_done() doesn't operate on it, only with MMC core
structures 'mmc_host' and 'mmc_request'. AFAICS that doesn't need
protection. The mtk-sd driver has it also outside the critical section.
But: clearing host->mrq looks like it should be *in* the critical section.
There is another case in tmio_mmc_set_ios() which also looks suspicious
at least. I'll double check all this tomorrow.

> * How does tmio_mmc_reset_work() ensures that the content mrq points to and
> passed to mmc_request_done() is still the correct content and hasn't changed
> asynchronously in the background (due to being outside the lock)?

Because it copies 'host->mrq' to a local variable 'mrq' within a
critical section:

 236         spin_lock_irqsave(&host->lock, flags);
 237         mrq = host->mrq;

and then passes this to mmc_request_done(). It even clears host->mrq
before that to enable asynchronous processing of the next mrq:

 270         host->mrq = NULL;
 271         mmc_request_done(host->mmc, mrq);

> * Why does tmio doesn't use cancel_delayed_work()? Maybe because its ensured
> by the interrupt locking no further work is scheduled?

If you want to cancel it, I think you should rather use
cancel_delayed_work_sync() to really know that there is nothing pending
anymore. But that function sleeps. TMIO approach is to simply bail out
on later calls:

 239         /*
 240          * is request already finished? Since we use a non-blocking
 241          * cancel_delayed_work(), it can happen, that a .set_ios() call preempts
 242          * us, so, have to check for IS_ERR(host->mrq)
 243          */
 244         if (IS_ERR_OR_NULL(mrq) ||
 245             time_is_after_jiffies(host->last_req_ts +
 246                                   msecs_to_jiffies(CMDREQ_TIMEOUT))) {
 247                 spin_unlock_irqrestore(&host->lock, flags);
 248                 return;
 249         }

Other drivers do this as well, like alcor:

 318  
 319         /* 
 320          * If this work gets rescheduled while running, it will 
 321          * be run again afterwards but without any active request. 
 322          */ 
 323         if (!host->mrq) 
 324                 return; 
 325  

I'd think this part is OK but I will double check as well.^

> In sum looking at the KASAN report and the locking in tmio_mmc_reset_work()
> we have an uncomfortable feeling. bcm2835_timeout() looks much safer
> regarding the locking. Of course that is just a feeling and can't be proven.

Yes, with a mutex and putting everything inside it, it uses a big hammer
and no questions asked. I am not complaining here, that is a sensible
approach at times. We have, however, historically been introduced with
spinlocks and a more finegrained approach. I prefer to (and will do) double
check the critical sections first before considering more intrusive
approaches which would need a lot of testing, of course.

> Therefore we want to ask here :)

I hope I could help so far. I am open for discussion, of course!

All the best,

   Wolfram

Attachment:
signature.asc

Description: PGP signature