On Wed, 2019-03-20 at 11:11 -0400, Laurence Oberman wrote: +AD4 On Wed, 2019-03-20 at 09:45 -0400, Laurence Oberman wrote: +AD4 +AD4 Hello Bart, I hope all is well with you. +AD4 +AD4 +AD4 +AD4 Quick question +AD4 +AD4 preparing to test v5.1-rc2 SRP, my usual method is first validate the +AD4 +AD4 prior kernel I had in place. +AD4 +AD4 This had passed tests previously (5.0.0-rc2) but I had not run the +AD4 +AD4 target server reboot test, just the disconnect tests. +AD4 +AD4 +AD4 +AD4 Today with mapped SRP devices I rebooted the target server and the +AD4 +AD4 client panicked. +AD4 +AD4 +AD4 +AD4 Its been a while and I have been so busy that have not kept up with +AD4 +AD4 all +AD4 +AD4 the fixes. Is this a known issue. +AD4 +AD4 +AD4 +AD4 Thanks +AD4 +AD4 Laurence +AD4 +AD4 +AD4 +AD4 5414228.917507+AF0 scsi host2: ib+AF8-srp: Path record query failed: sgid +AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6ed3, dgid +AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6e4f, pkey 0xffff, service+AF8-id +AD4 +AD4 0x7cfe900300726e4e +AD4 +AD4 +AFs-5414229.014355+AF0 scsi host2: reconnect attempt 7 failed (-110) +AD4 +AD4 +AFs-5414239.318161+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318165+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318167+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318168+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318170+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318172+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318173+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414239.318175+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed +AD4 +AD4 +AFs-5414243.670072+AF0 scsi host2: ib+AF8-srp: Got failed path rec status -110 +AD4 +AD4 +AFs-5414243.702179+AF0 scsi host2: ib+AF8-srp: Path record query failed: sgid +AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6ed3, dgid +AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6e4f, pkey 0xffff, service+AF8-id +AD4 +AD4 0x7cfe900300726e4e +AD4 +AD4 +AFs-5414243.799313+AF0 scsi host2: reconnect attempt 8 failed (-110) +AD4 +AD4 +AFs-5414247.510115+AF0 scsi host1: ib+AF8-srp: Sending CM REQ failed +AD4 +AD4 +AFs-5414247.510140+AF0 scsi host1: reconnect attempt 1 failed (-104) +AD4 +AD4 +AFs-5414247.849078+AF0 BUG: unable to handle kernel NULL pointer +AD4 +AD4 dereference +AD4 +AD4 at 00000000000000b8 +AD4 +AD4 +AFs-5414247.893793+AF0 +ACM-PF error: +AFs-normal kernel read fault+AF0 +AD4 +AD4 +AFs-5414247.921839+AF0 PGD 0 P4D 0 +AD4 +AD4 +AFs-5414247.937280+AF0 Oops: 0000 +AFsAIw-1+AF0 SMP PTI +AD4 +AD4 +AFs-5414247.958332+AF0 CPU: 4 PID: 7773 Comm: kworker/4:1H Kdump: loaded +AD4 +AD4 Tainted: G I 5.0.0-rc2+- +ACM-2 +AD4 +AD4 +AFs-5414248.012856+AF0 Hardware name: HP ProLiant DL380 G7, BIOS P67 +AD4 +AD4 08/16/2015 +AD4 +AD4 +AFs-5414248.026174+AF0 device-mapper: multipath: Failing path 8:48. +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AFs-5414248.050003+AF0 Workqueue: kblockd blk+AF8-mq+AF8-run+AF8-work+AF8-fn +AD4 +AD4 +AFs-5414248.108378+AF0 RIP: 0010:blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list+-0xc9/0x590 +AD4 +AD4 +AFs-5414248.139724+AF0 Code: 0f 85 c2 04 00 00 83 44 24 28 01 48 8b 45 00 +AD4 +AD4 48 +AD4 +AD4 39 c5 0f 84 ea 00 00 00 48 8b 5d 00 80 3c 24 00 4c 8d 6b b8 4c 8b 63 +AD4 +AD4 c8 +AD4 +AD4 75 25 +ADw-49+AD4 8b 84 24 b8 00 00 00 48 8b 40 40 48 8b 40 10 48 85 c0 74 +AD4 +AD4 10 +AD4 +AD4 4c +AD4 +AD4 +AFs-5414248.246176+AF0 RSP: 0018:ffffb1cd8760fd90 EFLAGS: 00010246 +AD4 +AD4 +AFs-5414248.275599+AF0 RAX: ffffa049d67a1308 RBX: ffffa049d67a1308 RCX: +AD4 +AD4 0000000000000004 +AD4 +AD4 +AFs-5414248.316090+AF0 RDX: 0000000000000000 RSI: ffffb1cd8760fe20 RDI: +AD4 +AD4 ffffa0552ca08000 +AD4 +AD4 +AFs-5414248.356884+AF0 RBP: ffffb1cd8760fe20 R08: 0000000000000000 R09: +AD4 +AD4 8080808080808080 +AD4 +AD4 +AFs-5414248.397632+AF0 R10: 0000000000000001 R11: 0000000000000001 R12: +AD4 +AD4 0000000000000000 +AD4 +AD4 +AFs-5414248.439323+AF0 R13: ffffa049d67a12c0 R14: 0000000000000000 R15: +AD4 +AD4 ffffa0552ca08000 +AD4 +AD4 +AFs-5414248.481743+AF0 FS: 0000000000000000(0000) +AD4 +AD4 GS:ffffa04a37880000(0000) +AD4 +AD4 knlGS:0000000000000000 +AD4 +AD4 +AFs-5414248.528310+AF0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +AD4 +AD4 +AFs-5414248.561779+AF0 CR2: 00000000000000b8 CR3: 0000000e9d40e004 CR4: +AD4 +AD4 00000000000206e0 +AD4 +AD4 +AFs-5414248.602420+AF0 Call Trace: +AD4 +AD4 +AFs-5414248.616660+AF0 blk+AF8-mq+AF8-sched+AF8-dispatch+AF8-requests+-0x15c/0x180 +AD4 +AD4 +AFs-5414248.647066+AF0 +AF8AXw-blk+AF8-mq+AF8-run+AF8-hw+AF8-queue+-0x5f/0xf0 +AD4 +AD4 +AFs-5414248.672633+AF0 process+AF8-one+AF8-work+-0x171/0x370 +AD4 +AD4 +AFs-5414248.695443+AF0 worker+AF8-thread+-0x49/0x3f0 +AD4 +AD4 +AFs-5414248.716730+AF0 kthread+-0xf8/0x130 +AD4 +AD4 +AFs-5414248.735085+AF0 ? max+AF8-active+AF8-store+-0x80/0x80 +AD4 +AD4 +AFs-5414248.758569+AF0 ? kthread+AF8-bind+-0x10/0x10 +AD4 +AD4 +AFs-5414248.779953+AF0 ret+AF8-from+AF8-fork+-0x35/0x40 +AD4 +AD4 +AD4 +AD4 +AFs-5414248.801005+AF0 Modules linked in: ib+AF8-isert iscsi+AF8-target+AF8-mod +AD4 +AD4 target+AF8-core+AF8-mod ib+AF8-srp rpcrdma scsi+AF8-transport+AF8-srp rdma+AF8-ucm ib+AF8-iser +AD4 +AD4 ib+AF8-ipoib ib+AF8-umad rdma+AF8-cm libiscsi iw+AF8-cm scsi+AF8-transport+AF8-iscsi ib+AF8-cm +AD4 +AD4 sunrpc mlx5+AF8-ib ib+AF8-uverbs ib+AF8-core intel+AF8-powerclamp coretemp kvm+AF8-intel +AD4 +AD4 kvm irqbypass ipmi+AF8-ssif crct10dif+AF8-pclmul crc32+AF8-pclmul +AD4 +AD4 ghash+AF8-clmulni+AF8-intel aesni+AF8-intel iTCO+AF8-wdt crypto+AF8-simd cryptd gpio+AF8-ich +AD4 +AD4 iTCO+AF8-vendor+AF8-support glue+AF8-helper joydev ipmi+AF8-si dm+AF8-service+AF8-time pcspkr +AD4 +AD4 ipmi+AF8-devintf hpilo hpwdt sg ipmi+AF8-msghandler acpi+AF8-power+AF8-meter lpc+AF8-ich +AD4 +AD4 i7core+AF8-edac pcc+AF8-cpufreq dm+AF8-multipath ip+AF8-tables xfs libcrc32c radeon +AD4 +AD4 sd+AF8-mod i2c+AF8-algo+AF8-bit drm+AF8-kms+AF8-helper syscopyarea sysfillrect sysimgblt +AD4 +AD4 fb+AF8-sys+AF8-fops ttm drm mlx5+AF8-core crc32c+AF8-intel serio+AF8-raw i2c+AF8-core hpsa +AD4 +AD4 bnx2 +AD4 +AD4 scsi+AF8-transport+AF8-sas mlxfw devlink dm+AF8-mirror dm+AF8-region+AF8-hash dm+AF8-log +AD4 +AD4 dm+AF8-mod +AD4 +AD4 +AFs-5414249.199354+AF0 CR2: 00000000000000b8 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 Looking at the vmcore +AD4 +AD4 PID: 7773 TASK: ffffa04a2c1e2b80 CPU: 4 COMMAND: +ACI-kworker/4:1H+ACI +AD4 +ACM-0 +AFs-ffffb1cd8760fab0+AF0 machine+AF8-kexec at ffffffffaaa6003f +AD4 +ACM-1 +AFs-ffffb1cd8760fb08+AF0 +AF8AXw-crash+AF8-kexec at ffffffffaab373ed +AD4 +ACM-2 +AFs-ffffb1cd8760fbd0+AF0 crash+AF8-kexec at ffffffffaab385b9 +AD4 +ACM-3 +AFs-ffffb1cd8760fbe8+AF0 oops+AF8-end at ffffffffaaa31931 +AD4 +ACM-4 +AFs-ffffb1cd8760fc08+AF0 no+AF8-context at ffffffffaaa6eb59 +AD4 +ACM-5 +AFs-ffffb1cd8760fcb0+AF0 do+AF8-page+AF8-fault at ffffffffaaa6feb2 +AD4 +ACM-6 +AFs-ffffb1cd8760fce0+AF0 page+AF8-fault at ffffffffab2010ee +AD4 +AFs-exception RIP: blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list+-201+AF0 +AD4 RIP: ffffffffaad90589 RSP: ffffb1cd8760fd90 RFLAGS: 00010246 +AD4 RAX: ffffa049d67a1308 RBX: ffffa049d67a1308 RCX: 0000000000000004 +AD4 RDX: 0000000000000000 RSI: ffffb1cd8760fe20 RDI: ffffa0552ca08000 +AD4 RBP: ffffb1cd8760fe20 R8: 0000000000000000 R9: 8080808080808080 +AD4 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 +AD4 R13: ffffa049d67a12c0 R14: 0000000000000000 R15: ffffa0552ca08000 +AD4 ORIG+AF8-RAX: ffffffffffffffff CS: 0010 SS: 0018 +AD4 +ACM-7 +AFs-ffffb1cd8760fe18+AF0 blk+AF8-mq+AF8-sched+AF8-dispatch+AF8-requests at +AD4 ffffffffaad9570c +AD4 +ACM-8 +AFs-ffffb1cd8760fe60+AF0 +AF8AXw-blk+AF8-mq+AF8-run+AF8-hw+AF8-queue at ffffffffaad8de3f +AD4 +ACM-9 +AFs-ffffb1cd8760fe78+AF0 process+AF8-one+AF8-work at ffffffffaaab0ab1 +AD4 +ACM-10 +AFs-ffffb1cd8760feb8+AF0 worker+AF8-thread at ffffffffaaab11d9 +AD4 +ACM-11 +AFs-ffffb1cd8760ff10+AF0 kthread at ffffffffaaab6758 +AD4 +ACM-12 +AFs-ffffb1cd8760ff50+AF0 ret+AF8-from+AF8-fork at ffffffffab200215 +AD4 +AD4 We were working on this request+AF8-queue for +AD4 blk+AF8-mq+AF8-sched+AF8-dispatch+AF8-requests +AD4 +AD4 crash+AD4 dev -d +AHw grep ffffa0552ca08000 +AD4 8 +AD4 ffffa055c81b5800 sdd ffffa0552ca08000 0 0 0 +AD4 +AF0 +AD4 +AD4 That device was no longer accessible +AD4 +AD4 sdev+AF8-state +AD0 SDEV+AF8-TRANSPORT+AF8-OFFLINE, +AD4 +AD4 So it looks like we tried to process a no longer valid list entry in +AD4 blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list +AD4 +AD4 /home/loberman/rpmbuild/BUILD/kernel-5.0.0+AF8-rc2+-/block/blk-mq.h: 211 +AD4 0xffffffffaad90589 +AD4 +ADw-blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list+-201+AD4: mov 0xb8(+ACU-r12),+ACU-rax +AD4 +AD4 R12 is NULL +AD4 +AD4 +AD4 From +AD4 static inline bool blk+AF8-mq+AF8-get+AF8-dispatch+AF8-budget(struct blk+AF8-mq+AF8-hw+AF8-ctx +AD4 +ACo-hctx) +AD4 +AHs +AD4 struct request+AF8-queue +ACo-q +AD0 hctx-+AD4-queue+ADs +AD4 +AD4 if (q-+AD4-mq+AF8-ops-+AD4-get+AF8-budget) +AD4 return q-+AD4-mq+AF8-ops-+AD4-get+AF8-budget(hctx)+ADs +AD4 return true+ADs +AD4 +AH0 +AD4 +AD4 Willw ait for a reply befaore i try the newer kernel, but looks like a +AD4 use after free to me Hi Laurence, I don't think that any of the recent SRP initiator changes can be the root cause of this crash. However, significant changes went upstream in the block layer core during the v5.1-rc1 merge window, e.g. multi-page bvec support. Is it possible for you to bisect this kernel oops? Thanks, Bart.