On Wed, Sep 29, 2021 at 2:07 AM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > On 9/27/21 10:10 PM, Yi Zhang wrote: > > Hi Bart > > > > Bisect shows this issue was introduced from bellow commit, btw, this is always reproduced on the s390x kvm environment: > > > > commit 65ca846a53149a1a72cd8d02e7b2e73dd545b834 > > Author: Bart Van Assche <bvanassche@xxxxxxx <mailto:bvanassche@xxxxxxx>> > > Date: Wed Jan 22 19:56:34 2020 -0800 > > > > scsi: core: Introduce {init,exit}_cmd_priv() > > > > The current behavior of the SCSI core is to clear driver-private data > > before preparing a request for submission to the SCSI LLD. Make it possible > > for SCSI LLDs to disable clearing of driver-private data. > > > > These hooks will be used by a later patch, namely "scsi: ufs: Let the SCSI > > core allocate per-command UFS data". > > > > (gdb) l *(scsi_mq_exit_request+0x2c) > > 0x8d7be4 is in scsi_mq_exit_request (drivers/scsi/scsi_lib.c:1780). > > 1775 unsigned int hctx_idx) > > 1776 { > > 1777 struct Scsi_Host *shost = set->driver_data; > > 1778 struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq); > > 1779 > > 1780 if (shost->hostt->exit_cmd_priv) > > 1781 shost->hostt->exit_cmd_priv(shost, cmd); > > 1782 kmem_cache_free(scsi_sense_cache, cmd->sense_buffer); > > 1783 } > > 1784 > > Hi Yi, > > Thank you for having taken the time to run a bisect. However, I strongly doubt > that the bisection result is correct. If there would be anything wrong with the > above patch it would already have been noticed on other architectures. I > recommend to proceed as follows: > * Verify whether the reported issue only occurs with the stable kernel series or > also with mainline kernels. This can be reproduced on both stable kernels and mainline kernels. > * Work with the soft-iWARP author to improve the reliability of the siw driver. > If I run blktests in an x86 VM then the following appears sporadically in > the kernel log: > > ------------[ cut here ]------------ > WARNING: CPU: 18 PID: 5462 at drivers/infiniband/sw/siw/siw_cm.c:255 __siw_cep_dealloc+0x184/0x190 [siw] > CPU: 1 PID: 5462 Comm: kworker/u144:13 Tainted: G E 5.15.0-rc2-dbg+ #7 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > Workqueue: iw_cm_wq cm_work_handler [iw_cm] > RIP: 0010:__siw_cep_dealloc+0x184/0x190 [siw] > Call Trace: > siw_cep_put+0x5c/0x80 [siw] > siw_reject+0x13c/0x230 [siw] > iw_cm_reject+0xac/0x130 [iw_cm] > cm_conn_req_handler+0x4f1/0x7d0 [iw_cm] > cm_work_handler+0x885/0x9c0 [iw_cm] > process_one_work+0x535/0xad0 > worker_thread+0x2e7/0x700 > kthread+0x1f6/0x220 > ret_from_fork+0x1f/0x30 > irq event stamp: 11449266 > hardirqs last enabled at (11449265): [<ffffffff81fc4248>] _raw_spin_unlock_irq+0x28/0x50 > hardirqs last disabled at (11449266): [<ffffffff81fb7e44>] __schedule+0x5f4/0xbb0 > softirqs last enabled at (11449176): [<ffffffffa06d142f>] p_fill_from_dev_buffer+0xff/0x140 [scsi_debug] > softirqs last disabled at (11449168): [<ffffffffa06d1400>] p_fill_from_dev_buffer+0xd0/0x140 [scsi_debug] > ---[ end trace b23871487c995b72 ]--- > > * Use the rdma_rxe driver to run blktests since at least in my experience that > driver is more reliable than the soft-iWARP driver. > I would suggest reproducing it on s390x platform since it was easy on that platform from my testing. And from the CKI tests history, it also has been reproduced on ppc64le/aarch64 with rdma_rxe. BTW, I've verified this issue with Ming's patch on s390x, thanks for looking this issue. https://lore.kernel.org/linux-scsi/20210930124415.1160754-1-ming.lei@xxxxxxxxxx/T/#u > Thanks, > > Bart. > -- Best Regards, Yi Zhang