On 9/27/21 10:10 PM, Yi Zhang wrote:
Hi Bart Bisect shows this issue was introduced from bellow commit, btw, this is always reproduced on the s390x kvm environment: commit 65ca846a53149a1a72cd8d02e7b2e73dd545b834 Author: Bart Van Assche <bvanassche@xxxxxxx <mailto:bvanassche@xxxxxxx>> Date: Wed Jan 22 19:56:34 2020 -0800 scsi: core: Introduce {init,exit}_cmd_priv() The current behavior of the SCSI core is to clear driver-private data before preparing a request for submission to the SCSI LLD. Make it possible for SCSI LLDs to disable clearing of driver-private data. These hooks will be used by a later patch, namely "scsi: ufs: Let the SCSI core allocate per-command UFS data". (gdb) l *(scsi_mq_exit_request+0x2c) 0x8d7be4 is in scsi_mq_exit_request (drivers/scsi/scsi_lib.c:1780). 1775 unsigned int hctx_idx) 1776 { 1777 struct Scsi_Host *shost = set->driver_data; 1778 struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq); 1779 1780 if (shost->hostt->exit_cmd_priv) 1781 shost->hostt->exit_cmd_priv(shost, cmd); 1782 kmem_cache_free(scsi_sense_cache, cmd->sense_buffer); 1783 } 1784
Hi Yi, Thank you for having taken the time to run a bisect. However, I strongly doubt that the bisection result is correct. If there would be anything wrong with the above patch it would already have been noticed on other architectures. I recommend to proceed as follows: * Verify whether the reported issue only occurs with the stable kernel series or also with mainline kernels. * Work with the soft-iWARP author to improve the reliability of the siw driver. If I run blktests in an x86 VM then the following appears sporadically in the kernel log: ------------[ cut here ]------------ WARNING: CPU: 18 PID: 5462 at drivers/infiniband/sw/siw/siw_cm.c:255 __siw_cep_dealloc+0x184/0x190 [siw] CPU: 1 PID: 5462 Comm: kworker/u144:13 Tainted: G E 5.15.0-rc2-dbg+ #7 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Workqueue: iw_cm_wq cm_work_handler [iw_cm] RIP: 0010:__siw_cep_dealloc+0x184/0x190 [siw] Call Trace: siw_cep_put+0x5c/0x80 [siw] siw_reject+0x13c/0x230 [siw] iw_cm_reject+0xac/0x130 [iw_cm] cm_conn_req_handler+0x4f1/0x7d0 [iw_cm] cm_work_handler+0x885/0x9c0 [iw_cm] process_one_work+0x535/0xad0 worker_thread+0x2e7/0x700 kthread+0x1f6/0x220 ret_from_fork+0x1f/0x30 irq event stamp: 11449266 hardirqs last enabled at (11449265): [<ffffffff81fc4248>] _raw_spin_unlock_irq+0x28/0x50 hardirqs last disabled at (11449266): [<ffffffff81fb7e44>] __schedule+0x5f4/0xbb0 softirqs last enabled at (11449176): [<ffffffffa06d142f>] p_fill_from_dev_buffer+0xff/0x140 [scsi_debug] softirqs last disabled at (11449168): [<ffffffffa06d1400>] p_fill_from_dev_buffer+0xd0/0x140 [scsi_debug] ---[ end trace b23871487c995b72 ]--- * Use the rdma_rxe driver to run blktests since at least in my experience that driver is more reliable than the soft-iWARP driver. Thanks, Bart.