[bug report] rdma_rxe: kernel NULL pointer observed with blktests nvme/012 on ppc64le/aarch64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

I found this bug with blktests nvme/012 on ppc64le/aarch64, could anyone help check it,
Let me know if you need any test for further investigation, thanks.

ppc64le:
[  155.427446] run blktests nvme/012 at 2020-10-18 09:54:03
[  156.593195] rdma_rxe: loaded
[  156.614836] infiniband rxe0: set active
[  156.614843] infiniband rxe0: added env2
[  156.617421] lo speed is unknown, defaulting to 1000
[  156.617449] lo speed is unknown, defaulting to 1000
[  156.617484] lo speed is unknown, defaulting to 1000
[  156.619911] infiniband rxe1: set active
[  156.619916] infiniband rxe1: added lo
[  156.619987] lo speed is unknown, defaulting to 1000
[  156.640971] Rounding down aligned max_sectors from 4294967295 to 4294967168
[  156.641095] db_root: cannot open: /etc/target
[  156.655793] lo speed is unknown, defaulting to 1000
[  156.700482] loop: module loaded
[  156.744820] Loading iSCSI transport class v2.0-870.
[  156.805883] iscsi: registered transport (iser)
[  156.833559] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  156.884254] nvmet_rdma: enabling port 0 (10.0.2.182:4420)
[  157.005564] RPC: Registered rdma transport module.
[  157.005573] RPC: Registered rdma backchannel transport module.
[  157.121039] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0c00c1b2-6456-4e5e-a0d4-9677000ca7bc.
[  157.121388] nvme nvme0: creating 16 I/O queues.
[  157.128894] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[  157.130039] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 10.0.2.182:4420
[  158.196673] XFS (nvme0n1): Mounting V5 Filesystem
[  158.206313] XFS (nvme0n1): Ending clean mount
[  158.207141] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038 (0x7fffffff)
[  190.087119] XFS (nvme0n1): Unmounting Filesystem
[  190.087284] BUG: Kernel NULL pointer dereference on read at 0x00000000
[  190.087289] Faulting instruction address: 0xc0000000000ae400
[  190.087294] Oops: Kernel access of bad area, sig: 11 [#1]
[  190.087298] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[  190.087302] Modules linked in: ib_isert iscsi_target_mod rpcrdma ib_iser libiscsi scsi_transport_iscsi loop ib_srpt target_core_mod crc32_generic rdma_rxe ib_srp ib_ipoib rdma_ucm ib_uverbs ip6_udp_tunnel udp_tunnel nvme_rdma nvme_fabrics nvme_core ib_umad nvmet_rdma rdma_cm iw_cm ib_cm ib_core nvmet rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd bonding grace fscache rfkill sunrpc xts pseries_rng uio_pdrv_genirq vmx_crypto uio ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp
[  190.087335] CPU: 9 PID: 56 Comm: ksoftirqd/9 Not tainted 5.9.0 #1
[  190.087339] NIP:  c0000000000ae400 LR: c008000002d1f77c CTR: 0000000000000007
[  190.087342] REGS: c00000033b9bf550 TRAP: 0380   Not tainted  (5.9.0)
[  190.087345] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48088280  XER: 20040000
[  190.087351] CFAR: c0000000000ae3a8 IRQMASK: 0 
[  190.087351] GPR00: c008000002d1f77c c00000033b9bf7e0 c000000001842500 c00000033133b086 
[  190.087351] GPR04: 0000000000000000 00000000000003c0 0000000000000007 0000000000000001 
[  190.087351] GPR08: c00000033b9bfa70 c00000033133b086 0000000000000000 c008000002d243f0 
[  190.087351] GPR12: c0000000000b8680 c00000001eca3200 c00000032d2f4000 0000000000000001 
[  190.087351] GPR16: c00000032d2f4000 0000000000000400 0000000000000000 c008000007c3ba38 
[  190.087351] GPR20: 0000000000000000 c000000338a06c00 c00000033b9bfa70 c00000033b9bfa70 
[  190.087351] GPR24: 0000000000000001 c00000033133b086 c000000334083f00 000000005e7d553d 
[  190.087351] GPR28: 0000000000000000 00000000000003c0 00000000000003c0 0000000000000001 
[  190.087383] NIP [c0000000000ae400] memcpy_power7+0xa0/0x7e0
[  190.087388] LR [c008000002d1f77c] rxe_mem_copy+0x1f4/0x308 [rdma_rxe]
[  190.087391] Call Trace:
[  190.087394] [c00000033b9bf7e0] [c000000327d83158] 0xc000000327d83158 (unreliable)
[  190.087399] [c00000033b9bf8e0] [c008000002d1f77c] rxe_mem_copy+0x1f4/0x308 [rdma_rxe]
[  190.087404] [c00000033b9bf970] [c008000002d1fbe4] copy_data+0x11c/0x460 [rdma_rxe]
[  190.087409] [c00000033b9bfa00] [c008000002d131ac] rxe_requester+0x1054/0x1368 [rdma_rxe]
[  190.087414] [c00000033b9bfb50] [c008000002d21298] rxe_do_task+0x110/0x1d0 [rdma_rxe]
[  190.087418] [c00000033b9bfbe0] [c000000000157f40] tasklet_action_common.isra.18+0x1b0/0x1c0
[  190.087424] [c00000033b9bfc40] [c000000000d4e8bc] __do_softirq+0x15c/0x3b4
[  190.087427] [c00000033b9bfd30] [c0000000001577f4] run_ksoftirqd+0x64/0x90
[  190.087432] [c00000033b9bfd50] [c00000000018a794] smpboot_thread_fn+0x204/0x270
[  190.087436] [c00000033b9bfdb0] [c000000000184070] kthread+0x1a0/0x1b0
[  190.087440] [c00000033b9bfe20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c
[  190.087443] Instruction dump:
[  190.087446] f9c10070 f9e10078 fa010080 fa210088 fa410090 fa610098 fa8100a0 faa100a8 
[  190.087451] fac100b0 f8010110 78a6c9c2 7cc903a6 <e8040000> e8c40008 e8e40010 e9040018 
[  190.087458] ---[ end trace 244246d4a62eb74b ]---
[  190.089228] 
[  191.089236] Kernel panic - not syncing: Fatal exception


# gdb drivers/infiniband/sw/rxe/rdma_rxe.ko 
Reading symbols from drivers/infiniband/sw/rxe/rdma_rxe.ko...done.
(gdb) l *(rxe_mem_copy+0x1f4)
0xf7bc is in rxe_mem_copy (./include/linux/string.h:406).
401			if (q_size < size)
402				__read_overflow2();
403		}
404		if (p_size < size || q_size < size)
405			fortify_panic(__func__);
406		return __underlying_memcpy(p, q, size);
407	}
408	
409	__FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
410	{
(gdb) 

aarch64:
[  691.557907] r[  691.888082] lo speed is unknown, defaulting to 1000
[  691.892016] lo speed is unknown, defaulting to 1000
[  691.896920] lo speed is unknown, defaulting to 1000
[  691.904834] lo speed is unknown, defaulting to 1000
[  691.908762] lo speed is unknown, defaulting to 1000
[  691.913610] lo speed is unknown, defaulting to 1000
[  693.427251] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038 (0x7fffffff)
[  730.178748] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  730.186576] Mem abort info:
[  730.189356]   ESR = 0x96000006
[  730.192385]   EC = 0x25: DABT (current EL), IL = 32 bits
[  730.197680]   SET = 0, FnV = 0
[  730.200725]   EA = 0, S1PTW = 0
[  730.203843] Data abort info:
[  730.206707]   ISV = 0, ISS = 0x00000006
[  730.210533]   CM = 0, WnR = 0
[  730.213479] user pgtable: 64k pages, 42-bit VAs, pgdp=00000017b7950000
[  730.219995] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000, pmd=0000000000000000
[  730.230587] Internal error: Oops: 96000006 [#1] SMP
[  730.235441] Modules linked in: loop crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel nvme_rdma nvme_fabrics nvme_core nvmet_rdma nvmet rfkill vfat fat ib_isert iscsi_target_mod rpcrdma ib_srpt target_core_mod sunrpc ib_srp scsi_transport_srp rdma_ucm ib_umad ib_iser ib_ipoib libiscsi rdma_cm scsi_transport_iscsi ib_cm iw_cm dm_multipath mlx5_ib ib_uverbs ib_core acpi_ipmi crct10dif_ce ghash_ce sha2_ce sha256_arm64 ipmi_ssif sha1_ce sbsa_gwdt ipmi_devintf ipmi_msghandler ip_tables xfs libcrc32c sg mlx5_core sdhci_acpi mlxfw tls sdhci qcom_emac mmc_core ahci_platform libahci_platform hdma hdma_mgmt dm_mirror dm_region_hash dm_log dm_mod
[  730.291605] CPU: 24 PID: 131 Comm: ksoftirqd/24 Not tainted 5.8.0+ #2
[  730.298027] Hardware name: WIWYNN Qualcomm Centriq 2400 Reference Evaluation Platform CV90-LA115-P11/Qualcomm Centriq 2400 Customer Reference Board, BIOS 
[  730.311830] pstate: 20400005 (nzCv daif +PAN -UAO BTYPE=--)
[  730.317392] pc : __memcpy+0x100/0x180
[  730.321038] lr : rxe_mem_copy+0x1fc/0x230 [rdma_rxe]
[  730.325978] sp : fffffe001588faf0
[  730.329277] x29: fffffe001588faf0 x28: fffffe00294b1080 
[  730.334572] x27: fffffc17b5065086 x26: 00000000000003c0 
[  730.339867] x25: 0000000000000000 x24: fffffe00113f3000 
[  730.345162] x23: fffffc17ae07c0f0 x22: 00000000288c9f50 
[  730.350457] x21: fffffe001588fc60 x20: 0000000000000001 
[  730.355753] x19: 00000000000003c0 x18: 0000000000000002 
[  730.361048] x17: 1df0130affff0000 x16: 0000000000000001 
[  730.366343] x15: 0000000000000002 x14: 0000000000000000 
[  730.371638] x13: 0000000000000000 x12: 0000000000000000 
[  730.376933] x11: 0000000000000000 x10: 0000000000000000 
[  730.382228] x9 : fffffe000a52dd44 x8 : 0000000000000000 
[  730.387523] x7 : 0000000000000000 x6 : fffffc17b5065086 
[  730.392818] x5 : fffffe001588fc60 x4 : 0000000000000000 
[  730.398113] x3 : 00000000000003c0 x2 : 0000000000000340 
[  730.403409] x1 : 0000000000000000 x0 : fffffc17b5065086 
[  730.408705] Call trace:
[  730.411136]  __memcpy+0x100/0x180
[  730.414438]  copy_data+0xc4/0x318 [rdma_rxe]
[  730.418691]  rxe_requester+0xa58/0xe38 [rdma_rxe]
[  730.423379]  rxe_do_task+0x128/0x200 [rdma_rxe]
[  730.427892]  tasklet_action_common.isra.21+0xfc/0x130
[  730.432924]  tasklet_action+0x2c/0x38
[  730.436570]  __do_softirq+0x128/0x33c
[  730.440215]  run_ksoftirqd+0x40/0x58
[  730.443777]  smpboot_thread_fn+0x168/0x1b0
[  730.447856]  kthread+0x114/0x118
[  730.451066]  ret_from_fork+0x10/0x18
[  730.454627] Code: d503201f d503201f d503201f d503201f (a8c12027) 
[  730.460720] ---[ end trace 7d9bff591d1280d9 ]---
[  730.465302] Kernel panic - not syncing: Fatal exception in interrupt
[  730.471648] SMP: stopping secondary CPUs
[  730.475614] Kernel Offset: disabled
[  730.479017] CPU features: 0x040002,61800418
[  730.483183] Memory Limit: none
[  730.486232] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


Best Regards,
  Yi Zhang





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux