On Apr 14, 2023 / 17:58, Shin'ichiro Kawasaki wrote: > On Apr 14, 2023 / 09:33, John Garry wrote: [...] > > The failure may be due to one of my changes. Please see > > https://lore.kernel.org/lkml/5bdbfbbc-bac1-84a1-5f50-33a443e3292a@xxxxxxxxxx/ > > Thanks for the notice. I think your changes were applied to 6.4/scsi-queue, > which I've not yet tried. Then it should not be related to your changes. I took a closer look in your changes for kernel v6.4, and noticed that it might affect the scsi/007 failure I observed with kernel v6.3-rcX. I did some trials and found these: - On kernel v6.3-rc7 without your changes, the test case scsi/007 fails with unexpected read command success (The failure I found and reported). - On kernel v6.3-rc7 with your changes until "scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd" [1], scsi/007 fails and causes system hang. Kernel reported "BUG sdebug_queued_cmd". When I reverte [1] from the kernel, the failure symptom is same as v6.3-rc7 (no hang, no BUG). - On kernel v6.3-rc7 with your changes including [1] and "scsi: scsi_debug: Abort commands from scsi_debug_device_reset()" [2], scsi/007 passes. [1] https://lore.kernel.org/lkml/20230327074310.1862889-7-john.g.garry@xxxxxxxxxx/ [2] https://lore.kernel.org/linux-scsi/20230416175654.159163-1-john.g.garry@xxxxxxxxxx/ Your fix [2] intended to fix the BUG that [1] caused, but it also fixed the scsi/007 failure I found :) To understand the failure deeper, I added debug prints in scsi_debug, using kernel v6.3-rc7 with your changes just before [1]. This kernel does not have the fix [2], then it does not abort commands at device reset. When scsi error handler does BDR, bus device reset, scsi_debug does not cancel the hrtimer for the commands issued to the scsi_debug. This hrtimer is alive across the reset. When that hrtimer expires, scsi_debug completes the command that issued _after_ BDR. The hrtimer for the command before BDR completes the command after BDR since those two commands use the same scsi_cmnd and rq objects reused. Then the command issued after BDR completes earlier than expected, and results in the unexpected read command success and scsi/007 failure. After applying the fix [2], scsi_debug cancels hrtimers at reset. Then, the hrtimers started before reset do not affect the commands issued after reset. These findings mean that the scsi/007 failure I found with kernel v6.3-rc7 indicated the bug in scsi_debug, and the commit [2] fixed it. Now I don't think blktests side fix for scsi/007 is required. Good :)