Re: [PATCH RFT] mmc: tmio: avoid concurrent runs of mmc_request_done()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dirk,

> > With the to-be-fixed commit, the reset_work handler cleared 'host->mrq'
> > outside of the spinlock protected critical section. That leaves a small
> > race window during execution of 'tmio_mmc_reset()' where the done_work
> > handler could grab a pointer to the now invalid 'host->mrq'. Both would
> > use it to call mmc_request_done() causing problems (see Link).
> > 
> > However, 'host->mrq' cannot simply be cleared earlier inside the
> > critical section. That would allow new mrqs to come in asynchronously
> > while the actual reset of the controller still needs to be done. So,
> > like 'tmio_mmc_set_ios()', an ERR_PTR is used to prevent new mrqs from
> > coming in but still avoiding concurrency between work handlers.
> > 
> > Reported-by: Dirk Behme <dirk.behme@xxxxxxxxxxxx>
> > Closes: https://lore.kernel.org/all/20240220061356.3001761-1-dirk.behme@xxxxxxxxxxxx/
> > Signed-off-by: Wolfram Sang <wsa+renesas@xxxxxxxxxxxxxxxxxxxx>
> > Fixes: df3ef2d3c92c ("mmc: protect the tmio_mmc driver against a theoretical race")
> 
> Tested-by: Dirk Behme <dirk.behme@xxxxxxxxxxxx>
> Reviewed-by: Dirk Behme <dirk.behme@xxxxxxxxxxxx>

Awesome! Thanks for the super-fast tags!

> At least the issues we observed before are not seen any more. As we are not
> exactly sure on the root cause, of course this is not a 100% proof. But as
> the change looks good, looks like it won't break something and the system
> behaves good with it I would say we are good to go.

I agree. We don't know if it is all you need. But there definitely was a
race window and closing it removes some observed anomalies. Let's hope
all of them :) I looked many times at the code and, to the best of my
knowledge, don't see side effects. 'host->mrq' stays non-NULL, so new
mrqs won't be added like before. Changing it to an ERR_PTR will only
affect the check in the done_work handler which is what we want. But, of
course, more eyes are always welcome.

> I think we could add anything like
> 
> Cc: stable@xxxxxxxxxxxxxxx # 3.0+

Yes, we should definitely have that. I would have added it once your
testing got good results. This affects every Renesas SDHI or Uniphier SD
instance since 3.0 (12 years). Wow! So, thanks a ton for your report and
assistance in debugging it. Very much appreciated! And, phew, I am happy
that this solution does not make the locking more complex \o/

All the best,

   Wolfram

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux