Re: SCSI layer RPM deadlock debug suggestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Any suggestion on how to fix this deadlock?
This is indeed a tricky question.  It seems like we should allow a
runtime resume to succeed if the only reason it failed was that the
device has been removed.

More generally, perhaps we should always consider that a runtime
resume succeeds.  Any remaining problems will be dealt with by the
device's driver and subsystem once the device is marked as
runtime-active again.

Suppose you try changing blk_post_runtime_resume() so that it always
calls blk_set_runtime_active() regardless of the value of err.  Does
that fix the problem?


Hi Alan,

I tried that suggestion with the following change:


--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -185,9 +185,8 @@ EXPORT_SYMBOL(blk_pre_runtime_resume);
  */
void blk_post_runtime_resume(struct request_queue *q, int err)
{
-
+       err = 0;
        if (!q->dev)
                return;
        if (!err) {


And that looks to solve the deadlock which I was seeing. I'm not sure on side-effects elsewhere.

We'll test it a bit more.

Thanks,
John

And more importantly, will it cause any other problems...?
That would cause trouble for the UFS driver and other drivers for which
runtime resume can fail due to e.g. the link between host and device
being in a bad state.

I don't understand how that could work.  If a device fails to resume
from runtime suspend, no matter whether the reason is temporary or
permanent, how can the system use it again?

And if the system can't use it again, what harm is there in pretending
that the runtime resume succeeded?

'xactly.
Especially as we _do_ have error recovery on SCSI, so we should be treating a failure to resume just like any other SCSI error; in the end, we need to equip SCSI EH to deal with these kind of states anyway. And we already do, as we're sending 'START STOP UNIT' already to spin up drives which are found to be spun down.

So I'm all for always returning 'success' from the 'resume' callback and let SCSI EH deal with any eventual fallout.





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux