Re: [LSF/MM TOPIC] Improving Asynchronous SCSI Disk Probing

"chenxiang (M)" <chenxiang66@xxxxxxxxxxxxx> · Fri, 26 Jan 2018 09:29:03 +0800

在 2018/1/18 7:24, Bart Van Assche 写道:
When the SCSI scanning code discovers a SCSI device it calls the driver core
function device_add() to associate a SCSI ULD with the device. The driver
core invokes the probing function for the matching SCSI ULP, e.g. sd_probe().
In order to minimize the time needed to scan SCSI targets that have a large
number of LUNs, the SCSI disk driver scans LUNs asynchronously by starting
the actual probing work asynchronously from inside sd_probe()

An unfortunate aspect of how SCSI disk probing works today is that there is
unnecessary serialization between probing and removal activity. For a
possible approach of how to increase SCSI disk probing concurrency, see also
[PATCH] sd: Increase SCSI disk probing concurrency, linux-scsi mailing list,
December 2017 (https://www.spinics.net/lists/linux-scsi/msg115657.html).

A second unfortunate aspect of SCSI disk probing is that certain race
conditions in the block layer are hit if removal starts before asynchronous
probing has finished. This is because the driver core is unaware that the
SCSI disk code works asynchronously.

Additionally, the SCSI disk asynchronous probing approach is incompatible
with the power management code. The power management code calls
wait_for_device_probe() in the driver core to wait for device probing
activity to finish. wait_for_device_probe() however is unaware of the
asynchronous probing in the SCSI sd driver and hence doesn't wait for the
SCSI sd probing activity to finish.

I encountered and reported a similar issue which it seems there is a 
race between device_resume and removing disk:
https://www.spinics.net/lists/linux-scsi/msg115069.html

My proposal is to hold a session to discus potential solutions for
increasing SCSI disk probing concurrency in a way that is compatible with
the driver core and the power management subsystem.