Saurav, Nilesh, Can you please look into this? It seems we have a major regression in the 6.5 QEDF driver. /John On 6/5/23 23:17, Guangwu Zhang wrote:
Hi, qedf IO testing found the error with latest linux-block/for-next, please have a look. kernel repo : https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git commit: Merge branch 'for-6.5/block' into for-next [ 7305.031233] ------------[ cut here ]------------ [ 7305.033749] [0000:04:00.2]:[qedf_process_error_detect:1525]:11: tx_buff_off=00000000, rx_buff_off=00000000, rx_id=073f [ 7305.038904] refcount_t: underflow; use-after-free. [ 7305.038918] WARNING: CPU: 23 PID: 35995 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 7305.065853] Modules linked in: nfsv3 nfs_acl bnx2fc cnic uio rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc vfat fat dm_service_time dm_multipath intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 i2c_algo_bit drm_shmem_helper dell_wmi_descriptor drm_kms_helper ipmi_ssif ledtrig_audio sparse_keymap irqbypass rfkill rapl video syscopyarea intel_cstate mei_me ipmi_si sysfillrect mei intel_uncore iTCO_wdt sysimgblt pcspkr iTCO_vendor_support dcdbas mxm_wmi ipmi_devintf lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg qede qedf crct10dif_pclmul crc32_pclmul crc32c_intel qed ahci libahci ghash_clmulni_intel libata libfcoe libfc tg3 megaraid_sas scsi_transport_fc crc8 wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tls] [ 7305.151246] CPU: 23 PID: 35995 Comm: kworker/23:0 Kdump: loaded Not tainted 6.4.0-rc3+ #1 [ 7305.160379] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 2.16.0 07/20/2022 [ 7305.168832] Workqueue: qedf_io_wq qedf_fp_io_handler [qedf] [ 7305.175069] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 7305.180911] Code: 01 01 e8 c9 4d ae ff 0f 0b c3 cc cc cc cc 80 3d 1a ae 6f 01 00 75 85 48 c7 c7 d8 e3 bb ac c6 05 0a ae 6f 01 01 e8 a6 4d ae ff <0f> 0b c3 cc cc cc cc 80 3d f5 ad 6f 01 00 0f 85 5e ff ff ff 48 c7 [ 7305.201873] RSP: 0018:ffff9cc488507e80 EFLAGS: 00010282 [ 7305.207708] RAX: 0000000000000000 RBX: ffff904a44ac7410 RCX: 0000000000000027 [ 7305.215685] RDX: ffff904e2fcdf848 RSI: 0000000000000001 RDI: ffff904e2fcdf840 [ 7305.223654] RBP: ffff9046cc488040 R08: 80000000ffff9b21 R09: 657466612d657375 [ 7305.231623] R10: 203b776f6c667265 R11: 646e75203a745f74 R12: ffff904e2fcf1880 [ 7305.239591] R13: ffffbcc47fcc7500 R14: 0000000000000000 R15: ffffbcc47fcc7505 [ 7305.247559] FS: 0000000000000000(0000) GS:ffff904e2fcc0000(0000) knlGS:0000000000000000 [ 7305.256595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7305.263011] CR2: 0000563088c3b008 CR3: 0000000102bfc006 CR4: 00000000001706e0 [ 7305.267445] [0000:04:00.2]:[qedf_process_error_detect:1519]:11: Error detection CQE, xid=0x743 [ 7305.270983] Call Trace: [ 7305.280598] [0000:04:00.2]:[qedf_process_error_detect:1521]:11: err_warn_bitmap=00000040:00000000 [ 7305.283328] <TASK> [ 7305.283330] qedf_fp_io_handler+0x40/0x50 [qedf] [ 7305.293237] [0000:04:00.2]:[qedf_process_error_detect:1525]:11: tx_buff_off=00000000, rx_buff_off=00000000, rx_id=04ec [ 7305.295574] process_one_work+0x1e5/0x3f0 [ 7305.296755] [0000:04:00.2]:[qedf_process_error_detect:1519]:11: Error detection CQE, xid=0x194 [ 7305.296761] [0000:04:00.2]:[qedf_process_error_detect:1521]:11: err_warn_bitmap=00000040:00000000 [ 7305.296764] [0000:04:00.2]:[qedf_process_error_detect:1525]:11: tx_buff_off=00000000, rx_buff_off=00000000, rx_id=013d [ 7305.324202] [0000:04:00.2]:[qedf_process_error_detect:1519]:11: Error detection CQE, xid=0x37c [ 7305.326752] ? __pfx_worker_thread+0x10/0x10 [ 7305.336659] [0000:04:00.2]:[qedf_process_error_detect:1521]:11: err_warn_bitmap=00000040:00000000 [ 7305.337543] [0000:04:00.2]:[qedf_process_error_detect:1519]:11: Error detection CQE, xid=0x47b [ 7305.337549] [0000:04:00.2]:[qedf_process_error_detect:1521]:11: err_warn_bitmap=00000000:00004000 [ 7305.337552] [0000:04:00.2]:[qedf_process_error_detect:1525]:11: tx_buff_off=00000000, rx_buff_off=00000000, rx_id=06ed [ 7305.348603] worker_thread+0x50/0x3a0 [ 7305.358219] [0000:04:00.2]:[qedf_process_error_detect:1525]:11: tx_buff_off=00000000, rx_buff_off=00000000, rx_id=04ed [ 7305.362987] ? __pfx_worker_thread+0x10/0x10 [ 7305.373920] [0000:04:00.2]:[qedf_process_error_detect:1519]:11: Error detection CQE, xid=0x731 [ 7305.382497] kthread+0xe2/0x110 [ 7305.382502] ? __pfx_kthread+0x10/0x10 [ 7305.382505] ret_from_fork+0x2c/0x50 [ 7305.382511] </TASK> [ 7305.382512] ---[ end trace 0000000000000000 ]---