Re: [PATCH] scsi: core: move scsi_host_busy() out of host lock for waking up EH handler

Hannes Reinecke <hare@xxxxxxx> · Fri, 12 Jan 2024 12:12:57 +0100

On 1/12/24 08:00, Ming Lei wrote:
Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host lock
every time for deciding if error handler kthread needs to be waken up.

This way can be too heavy in case of recovery, such as:

- N hardware queues
- queue depth is M for each hardware queue
- each scsi_host_busy() iterates over (N * M) tag/requests

If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M - 1)
times, and request has been iterated for (N*M - 1) * (N * M) times.

If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).

Fix the issue by calling scsi_host_busy() outside host lock, and we
don't need host lock for getting busy count because host lock never
covers that.

Can you share details for the hard lockup?
I do agree that scsi_host_busy() is an expensive operation, so it
might not be ideal to call it under a spin lock.
But I wonder where the lockup comes in here.
Care to explain?

And if it leads to a lockup, aren't other instances calling 
scsi_host_busy() under a spinlock affected, as well?

Cheers,

Hannes