Hi, Thanks for the comments. I tried to run sg_write_same over sg device and I am seeing the same problem. The result is as follows: 0. Kernel configs Version: 5.1-rc1 Boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y 1. Normal state : (As expected) The command succeeded $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 $ 2. Immediately after bringing down the iSCSI interface at the target : (As expected) Failed with DID_TRANSPORT_DISRUPTED after a few seconds $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 Write same: transport: Host_status=0x0e [DID_TRANSPORT_DISRUPTED] Driver_status=0x00 [DRIVER_OK, SUGGEST_OK] Write same(10) command failed 3. Immediately after the DID_TRANSPORT_DISRUPTED error : (BUG) The command succeeded after a few seconds $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 $ : Kernel logs Apr 8 18:28:03 init21-16 kernel: session1: session recovery timed out after 10 secs Apr 8 18:28:03 init21-16 kernel: sd 8:0:0:5: rejecting I/O to offline device 4. Issued IO again : (As expected) The command failed $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 Write same: pass through os error: No such device or address Write same(10) command failed Let me upload my fix for this and please give me some comments on that. Thanks, Jaesoo Lee. ---------- Forwarded message --------- From: Douglas Gilbert <dgilbert@xxxxxxxxxxxx> Date: Wed, Apr 3, 2019 at 2:06 PM Subject: Re: [bug report][scsi blk_mq] data corruption due to a bug in SCSI LLD error handling To: Jaesoo Lee <jalee@xxxxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Lee Duncan <lduncan@xxxxxxxx>, Chris Leech <cleech@xxxxxxxxxx>, James E.J. Bottomley <jejb@xxxxxxxxxxxxxxxxxx>, Martin K. Petersen <martin.petersen@xxxxxxxxxx> Cc: <linux-block@xxxxxxxxxxxxxxx>, <open-iscsi@xxxxxxxxxxxxxxxx>, <linux-scsi@xxxxxxxxxxxxxxx> On 2019-04-03 4:18 p.m., Jaesoo Lee wrote: > Hello All, > > I encountered this bug while trying to enable dm_blk_mq for our > iSCSI/FCP targets. > > The bug is that the sg_io issued to scsi_blk_mq would succeed even if > LLD wants to error out those requests. > > Let me explain the scenario in more details. > > Setup: > 0. Host kernel configuration > - 4.19.9, 4.20.16 > - boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y > scsi_transport_iscsi.debug_session=1 scsi_transport_iscsi.debug_conn=1 > > Scenario: > 1. Connect the host to iSCSI target via four paths > : A dm device is created for those target devices > 2. Start an application in the host which generates sg_io ioctl for > XCOPY and WSAME to the dm device with the ratio of around 50% > (pread/pwrite for the rest). > 3. Perform system crash (sysrq-trigger) in the iSCSI target > > Expected result: > - Any outstanding IOs should get failed with errors > > Actual results: > - Normal read/write IOs get failed as expected > - SG_IO ioctls SUCCEEDED!! Not all ioctl(SG_IO)s are created equal! If you are using the sg v3 interface (i.e. struct sg_io_hdr) then I would expect DRIVER_TIMEOUT in sg_io_obj.driver_status or DID_TIME_OUT in sg_io_obj.host_status to be set on completion. [BTW You will _not_ see a ETIMEDOUT errno; only errors prior to submission yield errno style errors.] If you don't see that with ioctl(SG_IO) on a block device then try again on a sg device. If neither report that then the mid-level error processing is broken. Doug Gilbert > - log message: > [Tue Apr 2 11:26:34 2019] session3: session recovery timed out after 11 secs > [Tue Apr 2 11:26:34 2019] session3: session_recovery_timedout: > Unblocking SCSI target > .. > [Tue Apr 2 11:26:34 2019] sd 8:0:0:8: scsi_prep_state_check: > rejecting I/O to offline device > [Tue Apr 2 11:26:34 2019] sd 8:0:0:8: scsi_prep_state_check: > rejecting I/O to offline device > [Tue Apr 2 11:26:34 2019] sd 8:0:0:8: scsi_prep_state_check: > rejecting I/O to offline device > [Tue Apr 2 11:26:34 2019] print_req_error: I/O error, dev sdi, sector 30677580 > [Tue Apr 2 11:26:34 2019] device-mapper: multipath: Failing path 8:128. > [Tue Apr 2 11:26:34 2019] SG_IO disk=sdi, result=0x0 > > - This causes the DATA corruption for the application > > Relavant call stacks: (SG_IO issue path) > [Tue Apr 2 11:26:33 2019] sd 8:0:0:8: [sdi] sd_ioctl: disk=sdi, cmd=0x2285 > [Tue Apr 2 11:26:33 2019] SG_IO disk=sdi, retried 1 cmd 93 > [Tue Apr 2 11:26:33 2019] CPU: 30 PID: 16080 Comm: iostress Not > tainted 4.19.9-purekernel_dbg.x86_64+ #30 > [Tue Apr 2 11:26:33 2019] Hardware name: /0JP31P, BIOS 2.0.19 08/29/2013 > [Tue Apr 2 11:26:33 2019] Call Trace: > [Tue Apr 2 11:26:33 2019] dump_stack+0x63/0x85 > [Tue Apr 2 11:26:33 2019] sg_io+0x41e/0x480 > [Tue Apr 2 11:26:33 2019] scsi_cmd_ioctl+0x297/0x420 > [Tue Apr 2 11:26:33 2019] ? sdev_prefix_printk+0xe9/0x120 > [Tue Apr 2 11:26:33 2019] ? _cond_resched+0x1a/0x50 > [Tue Apr 2 11:26:33 2019] scsi_cmd_blk_ioctl+0x51/0x70 > [Tue Apr 2 11:26:33 2019] sd_ioctl+0x95/0x1a0 [sd_mod] > [Tue Apr 2 11:26:33 2019] __blkdev_driver_ioctl+0x25/0x30 > [Tue Apr 2 11:26:33 2019] dm_blk_ioctl+0x79/0x100 [dm_mod] > [Tue Apr 2 11:26:33 2019] blkdev_ioctl+0x89a/0x940 > [Tue Apr 2 11:26:33 2019] ? do_nanosleep+0xae/0x180 > [Tue Apr 2 11:26:33 2019] block_ioctl+0x3d/0x50 > [Tue Apr 2 11:26:33 2019] do_vfs_ioctl+0xa6/0x600 > [Tue Apr 2 11:26:33 2019] ? __audit_syscall_entry+0xe9/0x130 > [Tue Apr 2 11:26:33 2019] ksys_ioctl+0x6d/0x80 > ).[Tue Apr 2 11:26:33 2019] __x64_sys_ioctl+0x1a/0x20 > [Tue Apr 2 11:26:33 2019] do_syscall_64+0x60/0x180 > [Tue Apr 2 11:26:33 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > According to the analysis, it seems that there is a bug in propagating > errors for iSCSI session timeout (i.e., session_recovery_timedout). > Compared to this, legacy SCSI queue handles the errors in > scsi_request_fn by killing the request as can be seen below. > > 1872 static void scsi_request_fn(struct request_queue *q) > 1873 __releases(q->queue_lock) > 1874 __acquires(q->queue_lock) > 1875 { > .. > 1886 for (;;) { > .. > 1893 req = blk_peek_request(q); > 1894 if (!req) > 1895 break; > 1896 > 1897 if (unlikely(!scsi_device_online(sdev))) { > 1898 sdev_printk(KERN_ERR, sdev, > 1899 "scsi_request_fn: rejecting > I/O to offline device\n"); > 1900 scsi_kill_request(req, q); > 1901 continue; > 1902 } > 1903 > > I am not sure in which layer we have to fix this, LLD or scsi_queue_rq(). > > Could someone please take a look? Or, could someone guide me on how to > fix this bug? > > Thanks, > > Jaesoo Lee. >