> On Sep 13, 2020, at 9:36 PM, Zhengyuan Liu <liuzhengyuang521@xxxxxxxxx> wrote: > > On Sat, Sep 12, 2020 at 1:37 AM Himanshu Madhani > <himanshu.madhani@xxxxxxxxxx> wrote: >> >> Hi, >> >>> On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@xxxxxxxxx> wrote: >>> >>> Hi, >>> >>> There is a panic of NULL pointer dereference on my arm64 server when >>> boot with the fabric line plugged into the HBA of QLE2692. After >>> binary-search with git bisect I found this panic is introduced by >>> commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan >>> retry fails"). The upstream and 4.19-stable both had the same problem >>> when reset to this point. but the upstream had fix this >>> unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce >>> holding sess_lock to prevent CPU") while the latest 4.19-stable still >>> has this issue. the panic showed as following: >>> >>> [ 13.380405][ 0] Unable to handle kernel NULL pointer dereference >>> at virtual address 0000000000000000 >>> [ 13.390947][ 0] Mem abort info: >>> [ 13.395535][ 0] ESR = 0x96000045 >>> [ 13.400390][ 0] Exception class = DABT (current EL), IL = 32 bits >>> [ 13.408089][ 0] SET = 0, FnV = 0 >>> . >>> [ 13.412941][ 0] EA = 0, S1PTW = 0 >>> [ 13.416747][ 0] Data abort info: >>> [ 13.420048][ 0] ISV = 0, ISS = 0x00000045 >>> [ 13.424293][ 0] CM = 0, WnR = 1 >>> [ 13.427676][ 0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____) >>> [ 13.434778][ 0] [0000000000000000] pgd=0000000000000000, >>> pud=0000000000000000 >>> [ 13.441968][ 0] Internal error: Oops: 96000045 [#1] SMP >>> [ 13.447250][ 0] Modules linked in: qla2xxx nvme_fc nvme_fabrics >>> scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp >>> libs >>> [ 13.472588][ 0] Process kworker/0:2 (pid: 343, stack limit = >>> 0x(____ptrval____)) >>> [ 13.472675][ 5] audit: type=1130 audit(1599118767.260:14): pid=1 >>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc >>> comm="sy' >>> [ 13.480032][ 0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G >>> W 4.19.90-19.ky10.aarch64 #1 >>> [ 13.480033][ 5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20 >>> [ 13.480045][ 0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx] >>> [ 13.499248][ 0] audit: type=1131 audit(1599118767.260:15): pid=1 >>> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc >>> comm="sy' >>> [ 13.508759][ 0] pstate: 40000005 (nZcv daif -PAN -UAO) >>> [ 13.547687][ 24] pc : __memset+0x16c/0x188 >>> [ 13.547697][ 0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx] >>> [ 13.547701][ 0] sp : ffffb2158236bc60 >>> [ 13.561388][ 0] x29: ffffb2158236bc60 x28: 0000000000000000 >>> [ 13.567104][ 0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8 >>> [ 13.572820][ 0] x25: ffff3be824b031e0 x24: 0000000000000028 >>> [ 13.578535][ 0] x23: ffffb2158600d188 x22: ffffb21586d3ea38 >>> [ 13.584251][ 0] x21: 0000000000008010 x20: ffffb21586d3ea08 >>> [ 13.589968][ 0] x19: ffffb2158600d040 x18: 0000000000000400 >>> [ 13.595683][ 0] x17: 0000000000000000 x16: ffff3be83f9a9500 >>> [ 13.601398][ 0] x15: 0000000000000400 x14: 0000000000000400 >>> [ 13.607114][ 0] x13: 0000000000000189 x12: 0000000000000001 >>> [ 13.612829][ 0] x11: 0000000000000000 x10: 0000000000000b40 >>> [ 13.618544][ 0] x9 : 0000000000000000 x8 : 0000000000000000 >>> [ 13.624259][ 0] x7 : 0000000000000000 x6 : 000000000000003f >>> [ 13.629974][ 0] x5 : 0000000000000040 x4 : 0000000000000000 >>> [ 13.635689][ 0] x3 : 0000000000000004 x2 : 0000000000007fd0 >>> [ 13.641404][ 0] x1 : 0000000000000000 x0 : 0000000000000000 >>> [ 13.647119][ 0] Call trace: >>> [ 13.649983][ 0] __memset+0x16c/0x188 >>> [ 13.653718][ 0] qla2x00_do_work+0x398/0x440 [qla2xxx] >>> [ 13.658920][ 0] qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx] >>> [ 13.664378][ 0] process_one_work+0x1f0/0x3c8 >>> [ 13.668797][ 0] worker_thread+0x48/0x4d0 >>> [ 13.672871][ 0] kthread+0x128/0x130 >>> [ 13.676514][ 0] ret_from_fork+0x10/0x18 >>> [ 13.680503][ 0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) >>> [ 13.687027][ 0] ---[ end trace 258cdcdd74a25238 ]--- >>> [ 13.692051][ 0] Kernel panic - not syncing: Fatal exception >> >> Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch. >> >> If you are able to verify then we can request for sable back port with your reported-by and tested-by tags. > > Yes, it did resolve my panic after backporting that commit to > 4.19-stable. But I cannot apply that commit directly, in order to > resolve the conflict I also backported commit: > 3b1e23aacf80 ("scsi: qla2xxx: Update rscn_rcvd field to more meaningful"). > a4863b16c31e ("scsi: qla2xxx: Move rport registration out of internal"). > These patches looks good for the 4.19-stable back port. Please post it to stable with Reported-by and Tested-by tag. Thanks. >> >> -- >> Himanshu Madhani Oracle Linux Engineering -- Himanshu Madhani Oracle Linux Engineering