Re: [PATCH blktests v2 2/2] [NOT-FOR-MERGE] just for testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.12.24 06:37, Zhijian Li (Fujitsu) wrote:
Hi, Shin'ichiro,

Your attached kconfig+this rnbd test triggered another BUG.

Cced: RDMA

Is this a known issue in RDMA/RXE communities?

From my side, it should not be a known issue. It seems that it is related with this rnbd test.

In previous tests, this problem does not appear.

I am not sure if others also find this problem or not. To me, it is my first time to find this problem.

Zhu Yanjun



On 26/12/2024 21:17, Shinichiro Kawasaki wrote:
On Dec 25, 2024 / 17:37, Li Zhijian wrote:
Hi, Shin'ichiro

All your comments has been addressed except the success ratio one. Could
you help to check this patch([NOT-FOR-MERGE] just for testing) that can tell
where it fails at in your envrionment.

I tested it today in my QEMU enviroment, It almost 100% success

Thanks for this effort. I ran rnbd/001 with this series in my QEMU environment.
It looks still failing. Please find the 001.out.bad file generated [X]. The
kernel was v6.13-rc4 with the fix patch "RDMA/ulp: Add missing deinit() call".

I wonder what is the difference between your environment and mine. FYI, my QEMU
environment has 4 CPUs and 16GB DRAM. It runs Fedora 40. I also attach the
kernel config I used just in case you are interested in.


Due to this bug, I cannot finish rnbd/001 at all.

However, I can reproduce your log by adding `_start_rnbd_client` before the iteration.
And it can be fixed by calling `_stop_rnbd_client` regardless of whether `_start_rnbd_client`
succeeds or not(Please feel free to give it a try when you have the opportunity).

diff --git a/tests/rnbd/001 b/tests/rnbd/001
index 9c6d56e3ee98..321c4c010e78 100755
--- a/tests/rnbd/001
+++ b/tests/rnbd/001
@@ -26,6 +26,7 @@ test_start_stop()
          local loop_dev i j=0
          loop_dev="$(losetup -f)"
+ _start_rnbd_client # this makes the _start_rnbd_client in below iteration fails
          for ((i=0;i<100;i++))
          do
                  if _start_rnbd_client "${loop_dev}" &>/dev/null; then
@@ -33,6 +34,7 @@ test_start_stop()
                          _stop_rnbd_client &>/dev/null && echo 'disconnect ok' || echo 'disconnect not ok'
                          ((j++))
                  else
+                       _stop_rnbd_client  # always stop rnbd so that we can connect again.
                          echo 'connect not ok'
                  fi
          done

===========================

[   27.864420] run blktests rnbd/001 at 2024-12-27 13:21:37
[   27.888742] infiniband eth0_rxe: set active
[   27.889497] infiniband eth0_rxe: added eth0
[   27.910304] rnbd_client L599: Mapping device /dev/loop0 on session blktest, (access_mode: rw, nr_poll_queues: 0)
[   27.924065] rnbd_client L1190: [session=blktest] mapped 4/4 default/read queues.
[   27.925825] rnbd_server L782: </dev/loop0@blktest>: Opened device 'loop0'
[   27.927554] rnbd_client L1612: </dev/loop0@blktest> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_zeroes_sectors: 0, max_discard_sectors: 0, discard_granularity: 51
2, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, wc: 0, fua: 0)
[   27.938295] rnbd_client L323: </dev/loop0@blktest> Unmapping device, option: normal.
[   27.962570] rnbd_server L238: </dev/loop0@blktest>: Device closed
[   27.967500] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   27.967500] BUG: kernel NULL pointer dereference, address: 0000000000000000                                                                                                                                       13:21:38 [11/9189]
[   27.976554] #PF: supervisor read access in kernel mode
[   27.984926] #PF: error_code(0x0000) - not-present page
[   27.989126] PGD 0 P4D 0
[   27.991067] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[   27.993226] CPU: 3 UID: 0 PID: 304 Comm: kworker/u20:2 Not tainted 6.13.0-rc3+ #1
[   27.996697] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   27.999333] Workqueue: rxe_wq do_work [rdma_rxe]
[   28.000309] RIP: 0010:memcpy_orig+0xd5/0x140
[   28.001304] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66
[   28.004932] RSP: 0018:ffffb934c0643cc0 EFLAGS: 00010246
[   28.005845] RAX: ffff976bc1e12d5a RBX: 0000000000000000 RCX: 0000000000000000
[   28.007090] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff976bc1e12d5a
[   28.008380] RBP: ffff976bc1e12d5a R08: 0000000000000001 R09: 0000000000000001
[   28.009639] R10: 0000000000000005 R11: 0000000000000000 R12: 0000000080000000
[   28.010836] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000008
[   28.011948] FS:  0000000000000000(0000) GS:ffff976f2fd80000(0000) knlGS:0000000000000000
[   28.013335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.014275] CR2: 0000000000000000 CR3: 00000001837da002 CR4: 00000000001706f0
[   28.015424] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   28.016598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   28.017728] Call Trace:
[   28.018114]  <TASK>
[   28.018453]  ? __die_body.cold+0x19/0x27
[   28.019167]  ? page_fault_oops+0x15a/0x2d0
[   28.019861]  ? search_module_extables+0x19/0x60
[   28.020617]  ? search_bpf_extables+0x5f/0x80
[   28.021611]  ? exc_page_fault+0x7e/0x180
[   28.022488]  ? asm_exc_page_fault+0x26/0x30
[   28.023547]  ? memcpy_orig+0xd5/0x140
[   28.024396]  rxe_mr_copy+0x1c3/0x200 [rdma_rxe]
[   28.025476]  ? rxe_pool_get_index+0x4b/0x80 [rdma_rxe]
[   28.026612]  copy_data+0xa5/0x230 [rdma_rxe]
[   28.027611]  rxe_requester+0xd9b/0xf70 [rdma_rxe]
[   28.028727]  ? finish_task_switch.isra.0+0x99/0x2e0
[   28.029878]  rxe_sender+0x13/0x40 [rdma_rxe]
[   28.030920]  do_task+0x68/0x1e0 [rdma_rxe]
[   28.031893]  process_one_work+0x177/0x330
[   28.032854]  worker_thread+0x252/0x390
[   28.033748]  ? __pfx_worker_thread+0x10/0x10
[   28.034665]  kthread+0xd2/0x100
[   28.035382]  ? __pfx_kthread+0x10/0x10
[   28.036252]  ret_from_fork+0x34/0x50
[   28.037220]  ? __pfx_kthread+0x10/0x10
[   28.038072]  ret_from_fork_asm+0x1a/0x30
[   28.038991]  </TASK>
[   28.039543] Modules linked in: loop rnbd_client rtrs_client rnbd_server rtrs_server rtrs_core rdma_cm iw_cm ib_cm rdma_rxe ib_uverbs ib_core ip6_udp_tunnel udp_tunnel rfkill intel_rapl_msr intel_rapl_common kmem rapl cxl_mem iTC
O_wdt intel_pmc_bxt cxl_pmem dax_hmem iTCO_vendor_support device_dax cxl_acpi cxl_pci cxl_port joydev qxl cxl_core pcspkr drm_ttm_helper lpc_ich ttm i2c_i801 virtio_balloon i2c_smbus nd_pmem nd_btt dax_pmem einj ip_tables crct10dif
_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtiofs fuse virtio_net nfit virtio_console net_failover libnvdimm serio_raw virtio_blk failover qemu_fw_cf
g dm_multipath sunrpc
[   28.051034] CR2: 0000000000000000
[   28.052072] ---[ end trace 0000000000000000 ]---
[   28.053099] RIP: 0010:memcpy_orig+0xd5/0x140
[   28.054188] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66
[   28.058290] RSP: 0018:ffffb934c0643cc0 EFLAGS: 00010246
[   28.059514] RAX: ffff976bc1e12d5a RBX: 0000000000000000 RCX: 0000000000000000
[   28.061194] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff976bc1e12d5a
[   28.062588] RBP: ffff976bc1e12d5a R08: 0000000000000001 R09: 0000000000000001






[X]

001.out.bad
----------------------------------------------------------------------------
Running rnbd/001
connect ok
disconnect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
connect not ok
Failed: 1/100
Test complete





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux