Re: blktest failures

Bart Van Assche <bvanassche@xxxxxxx> · Sat, 9 Apr 2022 14:47:11 -0700

On 4/9/22 14:43, Bob Pearson wrote:
On 4/9/22 00:04, Christoph Hellwig wrote:
On Fri, Apr 08, 2022 at 04:25:12PM -0700, Bart Van Assche wrote:
One of the functions in the above call stack is sd_remove(). sd_remove()
calls del_gendisk() just before calling sd_shutdown(). sd_shutdown()
submits the SYNCHRONIZE CACHE command. In del_gendisk() I found the
following comment: "Fail any new I/O". Do you agree that failing new I/O
before sd_shutdown() is called is wrong? Is there any other way to fix this
than moving the blk_queue_start_drain() etc. calls out of del_gendisk() and
into a new function?

That SYNCHRONIZE CACHE is a passthrough command sent on the request_queue
and should not be affected by stopping all file system I/O.

When I run check -q srp
all the test cases pass but each one stops for 3+ minutes at synchronize cache.
The rxe device is still active until sync cache returns when the last QP and the PD
are destroyed. It may be that the queues are blocked waiting for something else
even though they have reported success??

Hi Bob,

After having taken a closer look at del_gendisk(), I agree with what 
Christoph wrote above. Please revert patch "scsi: scsi_debug: Address 
races following module load" locally when running blktests. See also 
https://lore.kernel.org/linux-scsi/5fb68dbd-ae0e-6230-8f9f-dd6df5593584@xxxxxxxxxxxx/T/#m47a23ffd5ce68b8183100444d6e711b6b4aba393.

Thanks,

Bart.