On 26/12/2024 21:17, Shinichiro Kawasaki wrote:
On Dec 25, 2024 / 17:37, Li Zhijian wrote:
Hi, Shin'ichiro
All your comments has been addressed except the success ratio one.
Could
you help to check this patch([NOT-FOR-MERGE] just for testing) that
can tell
where it fails at in your envrionment.
I tested it today in my QEMU enviroment, It almost 100% success
Thanks for this effort. I ran rnbd/001 with this series in my QEMU
environment.
It looks still failing. Please find the 001.out.bad file generated
[X]. The
kernel was v6.13-rc4 with the fix patch "RDMA/ulp: Add missing
deinit() call".
I wonder what is the difference between your environment and mine.
FYI, my QEMU
environment has 4 CPUs and 16GB DRAM. It runs Fedora 40. I also
attach the
kernel config I used just in case you are interested in.
Due to this bug, I cannot finish rnbd/001 at all.
However, I can reproduce your log by adding `_start_rnbd_client`
before the iteration.
And it can be fixed by calling `_stop_rnbd_client` regardless of
whether `_start_rnbd_client`
succeeds or not(Please feel free to give it a try when you have the
opportunity).
diff --git a/tests/rnbd/001 b/tests/rnbd/001
index 9c6d56e3ee98..321c4c010e78 100755
--- a/tests/rnbd/001
+++ b/tests/rnbd/001
@@ -26,6 +26,7 @@ test_start_stop()
local loop_dev i j=0
loop_dev="$(losetup -f)"
+ _start_rnbd_client # this makes the _start_rnbd_client in
below iteration fails
for ((i=0;i<100;i++))
do
if _start_rnbd_client "${loop_dev}" &>/dev/null; then
@@ -33,6 +34,7 @@ test_start_stop()
_stop_rnbd_client &>/dev/null && echo
'disconnect ok' || echo 'disconnect not ok'
((j++))
else
+ _stop_rnbd_client # always stop rnbd so that
we can connect again.
echo 'connect not ok'
fi
done
===========================
[ 27.864420] run blktests rnbd/001 at 2024-12-27 13:21:37
[ 27.888742] infiniband eth0_rxe: set active
[ 27.889497] infiniband eth0_rxe: added eth0
[ 27.910304] rnbd_client L599: Mapping device /dev/loop0 on session
blktest, (access_mode: rw, nr_poll_queues: 0)
[ 27.924065] rnbd_client L1190: [session=blktest] mapped 4/4
default/read queues.
[ 27.925825] rnbd_server L782: </dev/loop0@blktest>: Opened device
'loop0'
[ 27.927554] rnbd_client L1612: </dev/loop0@blktest> map_device:
Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512,
physical_block_size: 512, max_write_zeroes_sectors: 0,
max_discard_sectors: 0, discard_granularity: 51
2, discard_alignment: 0, secure_discard: 0, max_segments: 128,
max_hw_sectors: 248, wc: 0, fua: 0)
[ 27.938295] rnbd_client L323: </dev/loop0@blktest> Unmapping
device, option: normal.
[ 27.962570] rnbd_server L238: </dev/loop0@blktest>: Device closed
[ 27.967500] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 27.967500] BUG: kernel NULL pointer dereference, address:
0000000000000000 13:21:38 [11/9189]
[ 27.976554] #PF: supervisor read access in kernel mode
[ 27.984926] #PF: error_code(0x0000) - not-present page
[ 27.989126] PGD 0 P4D 0
[ 27.991067] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ 27.993226] CPU: 3 UID: 0 PID: 304 Comm: kworker/u20:2 Not tainted
6.13.0-rc3+ #1
[ 27.996697] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 27.999333] Workqueue: rxe_wq do_work [rdma_rxe]
[ 28.000309] RIP: 0010:memcpy_orig+0xd5/0x140
[ 28.001304] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89
5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa
08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc
cc 66
[ 28.004932] RSP: 0018:ffffb934c0643cc0 EFLAGS: 00010246
[ 28.005845] RAX: ffff976bc1e12d5a RBX: 0000000000000000 RCX:
0000000000000000
[ 28.007090] RDX: 0000000000000008 RSI: 0000000000000000 RDI:
ffff976bc1e12d5a
[ 28.008380] RBP: ffff976bc1e12d5a R08: 0000000000000001 R09:
0000000000000001
[ 28.009639] R10: 0000000000000005 R11: 0000000000000000 R12:
0000000080000000
[ 28.010836] R13: 0000000000000008 R14: 0000000000000008 R15:
0000000000000008
[ 28.011948] FS: 0000000000000000(0000) GS:ffff976f2fd80000(0000)
knlGS:0000000000000000
[ 28.013335] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 28.014275] CR2: 0000000000000000 CR3: 00000001837da002 CR4:
00000000001706f0
[ 28.015424] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 28.016598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 28.017728] Call Trace:
[ 28.018114] <TASK>
[ 28.018453] ? __die_body.cold+0x19/0x27
[ 28.019167] ? page_fault_oops+0x15a/0x2d0
[ 28.019861] ? search_module_extables+0x19/0x60
[ 28.020617] ? search_bpf_extables+0x5f/0x80
[ 28.021611] ? exc_page_fault+0x7e/0x180
[ 28.022488] ? asm_exc_page_fault+0x26/0x30
[ 28.023547] ? memcpy_orig+0xd5/0x140
[ 28.024396] rxe_mr_copy+0x1c3/0x200 [rdma_rxe]
[ 28.025476] ? rxe_pool_get_index+0x4b/0x80 [rdma_rxe]
[ 28.026612] copy_data+0xa5/0x230 [rdma_rxe]
[ 28.027611] rxe_requester+0xd9b/0xf70 [rdma_rxe]
[ 28.028727] ? finish_task_switch.isra.0+0x99/0x2e0
[ 28.029878] rxe_sender+0x13/0x40 [rdma_rxe]
[ 28.030920] do_task+0x68/0x1e0 [rdma_rxe]
[ 28.031893] process_one_work+0x177/0x330
[ 28.032854] worker_thread+0x252/0x390
[ 28.033748] ? __pfx_worker_thread+0x10/0x10
[ 28.034665] kthread+0xd2/0x100
[ 28.035382] ? __pfx_kthread+0x10/0x10
[ 28.036252] ret_from_fork+0x34/0x50
[ 28.037220] ? __pfx_kthread+0x10/0x10
[ 28.038072] ret_from_fork_asm+0x1a/0x30
[ 28.038991] </TASK>
[ 28.039543] Modules linked in: loop rnbd_client rtrs_client
rnbd_server rtrs_server rtrs_core rdma_cm iw_cm ib_cm rdma_rxe
ib_uverbs ib_core ip6_udp_tunnel udp_tunnel rfkill intel_rapl_msr
intel_rapl_common kmem rapl cxl_mem iTC
O_wdt intel_pmc_bxt cxl_pmem dax_hmem iTCO_vendor_support device_dax
cxl_acpi cxl_pci cxl_port joydev qxl cxl_core pcspkr drm_ttm_helper
lpc_ich ttm i2c_i801 virtio_balloon i2c_smbus nd_pmem nd_btt dax_pmem
einj ip_tables crct10dif
_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic
ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtiofs fuse
virtio_net nfit virtio_console net_failover libnvdimm serio_raw
virtio_blk failover qemu_fw_cf
g dm_multipath sunrpc
[ 28.051034] CR2: 0000000000000000
[ 28.052072] ---[ end trace 0000000000000000 ]---
[ 28.053099] RIP: 0010:memcpy_orig+0xd5/0x140
[ 28.054188] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89
5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa
08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc
cc 66
[ 28.058290] RSP: 0018:ffffb934c0643cc0 EFLAGS: 00010246
[ 28.059514] RAX: ffff976bc1e12d5a RBX: 0000000000000000 RCX:
0000000000000000
[ 28.061194] RDX: 0000000000000008 RSI: 0000000000000000 RDI:
ffff976bc1e12d5a
[ 28.062588] RBP: ffff976bc1e12d5a R08: 0000000000000001 R09:
0000000000000001
[X]
001.out.bad
----------------------------------------------------------------------------
Running rnbd/001
connect ok
disconnect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
Failed: 1/100
Test complete