Re: [RFC PATCH v2 0/2] Fix deadlock in ufs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 01 2021 at 13:48 -0800, Alan Stern wrote:
On Mon, Feb 01, 2021 at 12:11:23PM -0800, Asutosh Das (asd) wrote:
On 1/27/2021 7:26 PM, Asutosh Das wrote:
> v1 -> v2
> Use pm_runtime_get/put APIs.
> Assuming that all bsg devices are scsi devices may break.
>
> This patchset attempts to fix a deadlock in ufs.
> This deadlock occurs because the ufs host driver tries to resume
> its child (wlun scsi device) to send SSU to it during its suspend.
>
> Asutosh Das (2):
>    block: bsg: resume scsi device before accessing
>    scsi: ufs: Fix deadlock while suspending ufs host
>
>   block/bsg.c               |  8 ++++++++
>   drivers/scsi/ufs/ufshcd.c | 18 ++----------------
>   2 files changed, 10 insertions(+), 16 deletions(-)
>

Hi Alan/Bart

Please can you take a look at this series.
Please let me know if you've any better suggestions for this.

I haven't commented on them so far because I don't understand them.

Merging thread with Bart.

Against which kernel version has this patch series been prepared and
tested? Have you noticed the following patch series that went into
v5.11-rc1
https://lore.kernel.org/linux-scsi/20201209052951.16136-1-bvanassche@xxxxxxx/
Hi Bart - Yes this was tested with this series pulled in.
I'm on 5.10.9.

Thanks Alan.
I've tried to summarize below the problem that I'm seeing.

Problem:
There's a deadlock seen in ufs's runtime-suspend path.
Currently, the wlun's are registered to request based blk-pm.
During ufs pltform-dev runtime-suspend cb, as per protocol needs,
it sends a few cmds (uac, ssu) to wlun.

In this path, it tries to resume the ufs platform device which is actually
suspending and deadlocks.

Yes, if the host doesn't send any commands during it's suspend there wouldn't be
this deadlock.
Setting manage_start_stop would send ssu only.
I can't seem to find a way to send cmds to wlun during it's suspend.
Would overriding sd_pm_ops for wlun be a good idea?
Do you've any other pointers on how to do this?
I'd appreciate any pointers.


[RFC PATCH v2 1/2] block: bsg: resume platform device before accessing:

It may happen that the underlying device's runtime-pm is
not controlled by block-pm. So it's possible that when
commands are sent to the device, it's suspended and may not
be resumed by blk-pm. Hence explicitly resume the parent
which is the platform device.

If you want to send a command to the underlying device, why do you
resume the underlying device's _parent_?  Why not resume the device
itself?

Why is bsg sending commands to the underlying device in a way that
evades checks for whether the device is suspended?  Shouldn't the
device's driver already be responsible for automatically resuming the
device when a command is sent?

[RFC PATCH v2 2/2] scsi: ufs: Fix deadlock while suspending ufs host:

During runtime-suspend of ufs host, the scsi devices are
already suspended and so are the queues associated with them.
But the ufs host sends SSU to wlun during its runtime-suspend.
During the process blk_queue_enter checks if the queue is not in
suspended state. If so, it waits for the queue to resume, and never
comes out of it.
The commit
(d55d15a33: scsi: block: Do not accept any requests while suspended)
adds the check if the queue is in suspended state in blk_queue_enter().

Fix this, by decoupling wlun scsi devices from block layer pm.
The runtime-pm for these devices would be managed by bsg and sg drivers.

Why do you need to send a command to the wlun when the host is being
suspended?  Shouldn't that command already have been sent, at the time
when the wlun was suspended?

Alan Stern



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux