Re: Panic when rebooting target server testing srp on 5.0.0-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2019-03-20 at 11:11 -0400, Laurence Oberman wrote:
+AD4 On Wed, 2019-03-20 at 09:45 -0400, Laurence Oberman wrote:
+AD4 +AD4 Hello Bart, I hope all is well with you.
+AD4 +AD4 
+AD4 +AD4 Quick question
+AD4 +AD4 preparing to test v5.1-rc2 SRP, my usual method is first validate the
+AD4 +AD4 prior kernel I had in place.
+AD4 +AD4 This had passed tests previously (5.0.0-rc2) but I had not run the
+AD4 +AD4 target server reboot test, just the disconnect tests.
+AD4 +AD4 
+AD4 +AD4 Today with mapped SRP devices I rebooted the target server and the
+AD4 +AD4 client panicked.
+AD4 +AD4 
+AD4 +AD4 Its been a while and I have been so busy that have not kept up with
+AD4 +AD4 all
+AD4 +AD4 the fixes. Is this a known issue.
+AD4 +AD4 
+AD4 +AD4 Thanks
+AD4 +AD4 Laurence
+AD4 +AD4 
+AD4 +AD4 5414228.917507+AF0 scsi host2: ib+AF8-srp: Path record query failed: sgid
+AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6ed3, dgid
+AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6e4f, pkey 0xffff, service+AF8-id
+AD4 +AD4 0x7cfe900300726e4e
+AD4 +AD4 +AFs-5414229.014355+AF0 scsi host2: reconnect attempt 7 failed (-110)
+AD4 +AD4 +AFs-5414239.318161+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318165+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318167+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318168+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318170+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318172+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318173+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414239.318175+AF0 scsi host2: ib+AF8-srp: Sending CM DREQ failed
+AD4 +AD4 +AFs-5414243.670072+AF0 scsi host2: ib+AF8-srp: Got failed path rec status -110
+AD4 +AD4 +AFs-5414243.702179+AF0 scsi host2: ib+AF8-srp: Path record query failed: sgid
+AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6ed3, dgid
+AD4 +AD4 fe80:0000:0000:0000:7cfe:9003:0072:6e4f, pkey 0xffff, service+AF8-id
+AD4 +AD4 0x7cfe900300726e4e
+AD4 +AD4 +AFs-5414243.799313+AF0 scsi host2: reconnect attempt 8 failed (-110)
+AD4 +AD4 +AFs-5414247.510115+AF0 scsi host1: ib+AF8-srp: Sending CM REQ failed
+AD4 +AD4 +AFs-5414247.510140+AF0 scsi host1: reconnect attempt 1 failed (-104)
+AD4 +AD4 +AFs-5414247.849078+AF0 BUG: unable to handle kernel NULL pointer
+AD4 +AD4 dereference
+AD4 +AD4 at 00000000000000b8
+AD4 +AD4 +AFs-5414247.893793+AF0 +ACM-PF error: +AFs-normal kernel read fault+AF0
+AD4 +AD4 +AFs-5414247.921839+AF0 PGD 0 P4D 0 
+AD4 +AD4 +AFs-5414247.937280+AF0 Oops: 0000 +AFsAIw-1+AF0 SMP PTI
+AD4 +AD4 +AFs-5414247.958332+AF0 CPU: 4 PID: 7773 Comm: kworker/4:1H Kdump: loaded
+AD4 +AD4 Tainted: G          I       5.0.0-rc2+- +ACM-2
+AD4 +AD4 +AFs-5414248.012856+AF0 Hardware name: HP ProLiant DL380 G7, BIOS P67
+AD4 +AD4 08/16/2015
+AD4 +AD4 +AFs-5414248.026174+AF0 device-mapper: multipath: Failing path 8:48.
+AD4 +AD4 
+AD4 +AD4 
+AD4 +AD4 +AFs-5414248.050003+AF0 Workqueue: kblockd blk+AF8-mq+AF8-run+AF8-work+AF8-fn
+AD4 +AD4 +AFs-5414248.108378+AF0 RIP: 0010:blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list+-0xc9/0x590
+AD4 +AD4 +AFs-5414248.139724+AF0 Code: 0f 85 c2 04 00 00 83 44 24 28 01 48 8b 45 00
+AD4 +AD4 48
+AD4 +AD4 39 c5 0f 84 ea 00 00 00 48 8b 5d 00 80 3c 24 00 4c 8d 6b b8 4c 8b 63
+AD4 +AD4 c8
+AD4 +AD4 75 25 +ADw-49+AD4 8b 84 24 b8 00 00 00 48 8b 40 40 48 8b 40 10 48 85 c0 74
+AD4 +AD4 10
+AD4 +AD4 4c
+AD4 +AD4 +AFs-5414248.246176+AF0 RSP: 0018:ffffb1cd8760fd90 EFLAGS: 00010246
+AD4 +AD4 +AFs-5414248.275599+AF0 RAX: ffffa049d67a1308 RBX: ffffa049d67a1308 RCX:
+AD4 +AD4 0000000000000004
+AD4 +AD4 +AFs-5414248.316090+AF0 RDX: 0000000000000000 RSI: ffffb1cd8760fe20 RDI:
+AD4 +AD4 ffffa0552ca08000
+AD4 +AD4 +AFs-5414248.356884+AF0 RBP: ffffb1cd8760fe20 R08: 0000000000000000 R09:
+AD4 +AD4 8080808080808080
+AD4 +AD4 +AFs-5414248.397632+AF0 R10: 0000000000000001 R11: 0000000000000001 R12:
+AD4 +AD4 0000000000000000
+AD4 +AD4 +AFs-5414248.439323+AF0 R13: ffffa049d67a12c0 R14: 0000000000000000 R15:
+AD4 +AD4 ffffa0552ca08000
+AD4 +AD4 +AFs-5414248.481743+AF0 FS:  0000000000000000(0000)
+AD4 +AD4 GS:ffffa04a37880000(0000)
+AD4 +AD4 knlGS:0000000000000000
+AD4 +AD4 +AFs-5414248.528310+AF0 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+AD4 +AD4 +AFs-5414248.561779+AF0 CR2: 00000000000000b8 CR3: 0000000e9d40e004 CR4:
+AD4 +AD4 00000000000206e0
+AD4 +AD4 +AFs-5414248.602420+AF0 Call Trace:
+AD4 +AD4 +AFs-5414248.616660+AF0  blk+AF8-mq+AF8-sched+AF8-dispatch+AF8-requests+-0x15c/0x180
+AD4 +AD4 +AFs-5414248.647066+AF0  +AF8AXw-blk+AF8-mq+AF8-run+AF8-hw+AF8-queue+-0x5f/0xf0
+AD4 +AD4 +AFs-5414248.672633+AF0  process+AF8-one+AF8-work+-0x171/0x370
+AD4 +AD4 +AFs-5414248.695443+AF0  worker+AF8-thread+-0x49/0x3f0
+AD4 +AD4 +AFs-5414248.716730+AF0  kthread+-0xf8/0x130
+AD4 +AD4 +AFs-5414248.735085+AF0  ? max+AF8-active+AF8-store+-0x80/0x80
+AD4 +AD4 +AFs-5414248.758569+AF0  ? kthread+AF8-bind+-0x10/0x10
+AD4 +AD4 +AFs-5414248.779953+AF0  ret+AF8-from+AF8-fork+-0x35/0x40
+AD4 +AD4 
+AD4 +AD4 +AFs-5414248.801005+AF0 Modules linked in: ib+AF8-isert iscsi+AF8-target+AF8-mod
+AD4 +AD4 target+AF8-core+AF8-mod ib+AF8-srp rpcrdma scsi+AF8-transport+AF8-srp rdma+AF8-ucm ib+AF8-iser
+AD4 +AD4 ib+AF8-ipoib ib+AF8-umad rdma+AF8-cm libiscsi iw+AF8-cm scsi+AF8-transport+AF8-iscsi ib+AF8-cm
+AD4 +AD4 sunrpc mlx5+AF8-ib ib+AF8-uverbs ib+AF8-core intel+AF8-powerclamp coretemp kvm+AF8-intel
+AD4 +AD4 kvm irqbypass ipmi+AF8-ssif crct10dif+AF8-pclmul crc32+AF8-pclmul
+AD4 +AD4 ghash+AF8-clmulni+AF8-intel aesni+AF8-intel iTCO+AF8-wdt crypto+AF8-simd cryptd gpio+AF8-ich
+AD4 +AD4 iTCO+AF8-vendor+AF8-support glue+AF8-helper joydev ipmi+AF8-si dm+AF8-service+AF8-time pcspkr
+AD4 +AD4 ipmi+AF8-devintf hpilo hpwdt sg ipmi+AF8-msghandler acpi+AF8-power+AF8-meter lpc+AF8-ich
+AD4 +AD4 i7core+AF8-edac pcc+AF8-cpufreq dm+AF8-multipath ip+AF8-tables xfs libcrc32c radeon
+AD4 +AD4 sd+AF8-mod i2c+AF8-algo+AF8-bit drm+AF8-kms+AF8-helper syscopyarea sysfillrect sysimgblt
+AD4 +AD4 fb+AF8-sys+AF8-fops ttm drm mlx5+AF8-core crc32c+AF8-intel serio+AF8-raw i2c+AF8-core hpsa
+AD4 +AD4 bnx2
+AD4 +AD4 scsi+AF8-transport+AF8-sas mlxfw devlink dm+AF8-mirror dm+AF8-region+AF8-hash dm+AF8-log
+AD4 +AD4 dm+AF8-mod
+AD4 +AD4 +AFs-5414249.199354+AF0 CR2: 00000000000000b8
+AD4 +AD4 
+AD4 +AD4 
+AD4 
+AD4 Looking at the vmcore
+AD4 
+AD4 PID: 7773   TASK: ffffa04a2c1e2b80  CPU: 4   COMMAND: +ACI-kworker/4:1H+ACI
+AD4  +ACM-0 +AFs-ffffb1cd8760fab0+AF0 machine+AF8-kexec at ffffffffaaa6003f
+AD4  +ACM-1 +AFs-ffffb1cd8760fb08+AF0 +AF8AXw-crash+AF8-kexec at ffffffffaab373ed
+AD4  +ACM-2 +AFs-ffffb1cd8760fbd0+AF0 crash+AF8-kexec at ffffffffaab385b9
+AD4  +ACM-3 +AFs-ffffb1cd8760fbe8+AF0 oops+AF8-end at ffffffffaaa31931
+AD4  +ACM-4 +AFs-ffffb1cd8760fc08+AF0 no+AF8-context at ffffffffaaa6eb59
+AD4  +ACM-5 +AFs-ffffb1cd8760fcb0+AF0 do+AF8-page+AF8-fault at ffffffffaaa6feb2
+AD4  +ACM-6 +AFs-ffffb1cd8760fce0+AF0 page+AF8-fault at ffffffffab2010ee
+AD4     +AFs-exception RIP: blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list+-201+AF0
+AD4     RIP: ffffffffaad90589  RSP: ffffb1cd8760fd90  RFLAGS: 00010246
+AD4     RAX: ffffa049d67a1308  RBX: ffffa049d67a1308  RCX: 0000000000000004
+AD4     RDX: 0000000000000000  RSI: ffffb1cd8760fe20  RDI: ffffa0552ca08000
+AD4     RBP: ffffb1cd8760fe20   R8: 0000000000000000   R9: 8080808080808080
+AD4     R10: 0000000000000001  R11: 0000000000000001  R12: 0000000000000000
+AD4     R13: ffffa049d67a12c0  R14: 0000000000000000  R15: ffffa0552ca08000
+AD4     ORIG+AF8-RAX: ffffffffffffffff  CS: 0010  SS: 0018
+AD4  +ACM-7 +AFs-ffffb1cd8760fe18+AF0 blk+AF8-mq+AF8-sched+AF8-dispatch+AF8-requests at
+AD4 ffffffffaad9570c
+AD4  +ACM-8 +AFs-ffffb1cd8760fe60+AF0 +AF8AXw-blk+AF8-mq+AF8-run+AF8-hw+AF8-queue at ffffffffaad8de3f
+AD4  +ACM-9 +AFs-ffffb1cd8760fe78+AF0 process+AF8-one+AF8-work at ffffffffaaab0ab1
+AD4 +ACM-10 +AFs-ffffb1cd8760feb8+AF0 worker+AF8-thread at ffffffffaaab11d9
+AD4 +ACM-11 +AFs-ffffb1cd8760ff10+AF0 kthread at ffffffffaaab6758
+AD4 +ACM-12 +AFs-ffffb1cd8760ff50+AF0 ret+AF8-from+AF8-fork at ffffffffab200215
+AD4 
+AD4 We were working on this request+AF8-queue for
+AD4  blk+AF8-mq+AF8-sched+AF8-dispatch+AF8-requests
+AD4 
+AD4 crash+AD4 dev -d +AHw grep ffffa0552ca08000
+AD4     8
+AD4 ffffa055c81b5800   sdd        ffffa0552ca08000       0     0     0 
+AD4 +AF0
+AD4 
+AD4 That device was no longer accessible 
+AD4 
+AD4 sdev+AF8-state +AD0 SDEV+AF8-TRANSPORT+AF8-OFFLINE,
+AD4 
+AD4 So it looks like we tried to process a no longer valid list entry in 
+AD4 blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list
+AD4 
+AD4 /home/loberman/rpmbuild/BUILD/kernel-5.0.0+AF8-rc2+-/block/blk-mq.h: 211
+AD4 0xffffffffaad90589
+AD4 +ADw-blk+AF8-mq+AF8-dispatch+AF8-rq+AF8-list+-201+AD4:       mov    0xb8(+ACU-r12),+ACU-rax
+AD4 
+AD4 R12 is NULL
+AD4 
+AD4 
+AD4 From
+AD4 static inline bool blk+AF8-mq+AF8-get+AF8-dispatch+AF8-budget(struct blk+AF8-mq+AF8-hw+AF8-ctx
+AD4 +ACo-hctx)
+AD4 +AHs
+AD4         struct request+AF8-queue +ACo-q +AD0 hctx-+AD4-queue+ADs
+AD4 
+AD4         if (q-+AD4-mq+AF8-ops-+AD4-get+AF8-budget)
+AD4                 return q-+AD4-mq+AF8-ops-+AD4-get+AF8-budget(hctx)+ADs
+AD4         return true+ADs
+AD4 +AH0
+AD4 
+AD4 Willw ait for a reply befaore i try the newer kernel, but looks like a
+AD4 use after free to me

Hi Laurence,

I don't think that any of the recent SRP initiator changes can be the root
cause of this crash. However, significant changes went upstream in the block
layer core during the v5.1-rc1 merge window, e.g. multi-page bvec support.
Is it possible for you to bisect this kernel oops?

Thanks,

Bart.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux