Re: [bug report] shared tags causes IO hang and performance drop

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Tue, 20 Apr 2021 00:54:49 -0400

On 2021-04-19 11:22 p.m., Bart Van Assche wrote:
On 4/19/21 8:06 PM, Douglas Gilbert wrote:
I have always suspected under extreme pressure the block layer (or scsi
mid-level) does strange things, like an IO hang, attempts to prove that
usually lead back to my own code :-). But I have one example recently
where upwards of 10 commands had been submitted (blk_execute_rq_nowait())
and the following one stalled (all on the same thread). Seconds later
those 10 commands reported DID_TIME_OUT, the stalled thread awoke, and
my dd variant went to its conclusion (reporting 10 errors). Following
copies showed no ill effects.

My weapons of choice are sg_dd, actually sgh_dd and sg_mrq_dd. Those last
two monitor for stalls during the copy. Each submitted READ and WRITE
command gets its pack_id from an incrementing atomic and a management
thread in those copies checks every 300 milliseconds that that atomic
value is greater than the previous check. If not, dump the state of the
sg driver. The stalled request was in busy state with a timeout of 1
nanosecond which indicated that blk_execute_rq_nowait() had not been
called. So the chief suspect would be blk_get_request() followed by
the bio setup calls IMO.

So it certainly looked like an IO hang, not a locking, resource nor
corruption issue IMO. That was with a branch off MKP's
5.13/scsi-staging branch taken a few weeks back. So basically
lk 5.12.0-rc1 .

Hi Doug,

If it would be possible to develop a script that reproduces this hang and
if that script can be shared I will help with root-causing and fixing this
hang.

Possible, but not very practical:
   1) apply supplied 83 patches to sg driver
   2) apply pending patch to scsi_debug driver
   3) find a stable kernel platform (perhaps not lk 5.12.0-rc1)
   4) run supplied scripts for three weeks
   5) dig through the output and maybe find one case (there were lots
      of EAGAINs from blk_get_request() but they are expected when
      thrashing the storage layers)

My basic testing strategy may be useful for others:
    sg_dd iflag=random bs=512 of=/dev/sg6
    sg_dd if=/dev/sg6 bs=512 of=/dev/sg7
    sg_dd --verify if=/dev/sg6 bs=512 of=/dev/sg7

If the copy works, so should the verify (compare). The sg_dd utility in
sg3_utils release 1.46 is needed to support iflag=random in the first
line and the --verify in the third line.

If the backing LLD is scsi_debug, then per_host_store=1 is needed. Best
not to use SSDs. The above pattern will work just as well for /dev/sd*
device nodes, but iflag= and oflag= lists must contain the sgio flag.
Then ioctl(/dev/sd*, SG_IO, ...) is used for IO. The limitations of the
third line could be bypassed with something like:
    cmp /dev/sd6 /dev/sd7

If real disks are used, all user data will be trashed.

Doug Gilbert