On Sat, Sep 12, 2020 at 1:37 AM Himanshu Madhani <himanshu.madhani@xxxxxxxxxx> wrote: > > Hi, > > > On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@xxxxxxxxx> wrote: > > > > Hi, > > > > There is a panic of NULL pointer dereference on my arm64 server when > > boot with the fabric line plugged into the HBA of QLE2692. After > > binary-search with git bisect I found this panic is introduced by > > commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan > > retry fails"). The upstream and 4.19-stable both had the same problem > > when reset to this point. but the upstream had fix this > > unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce > > holding sess_lock to prevent CPU") while the latest 4.19-stable still > > has this issue. the panic showed as following: > > > > [ 13.380405][ 0] Unable to handle kernel NULL pointer dereference > > at virtual address 0000000000000000 > > [ 13.390947][ 0] Mem abort info: > > [ 13.395535][ 0] ESR = 0x96000045 > > [ 13.400390][ 0] Exception class = DABT (current EL), IL = 32 bits > > [ 13.408089][ 0] SET = 0, FnV = 0 > > . > > [ 13.412941][ 0] EA = 0, S1PTW = 0 > > [ 13.416747][ 0] Data abort info: > > [ 13.420048][ 0] ISV = 0, ISS = 0x00000045 > > [ 13.424293][ 0] CM = 0, WnR = 1 > > [ 13.427676][ 0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____) > > [ 13.434778][ 0] [0000000000000000] pgd=0000000000000000, > > pud=0000000000000000 > > [ 13.441968][ 0] Internal error: Oops: 96000045 [#1] SMP > > [ 13.447250][ 0] Modules linked in: qla2xxx nvme_fc nvme_fabrics > > scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp > > libs > > [ 13.472588][ 0] Process kworker/0:2 (pid: 343, stack limit = > > 0x(____ptrval____)) > > [ 13.472675][ 5] audit: type=1130 audit(1599118767.260:14): pid=1 > > uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc > > comm="sy' > > [ 13.480032][ 0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G > > W 4.19.90-19.ky10.aarch64 #1 > > [ 13.480033][ 5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20 > > [ 13.480045][ 0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx] > > [ 13.499248][ 0] audit: type=1131 audit(1599118767.260:15): pid=1 > > uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc > > comm="sy' > > [ 13.508759][ 0] pstate: 40000005 (nZcv daif -PAN -UAO) > > [ 13.547687][ 24] pc : __memset+0x16c/0x188 > > [ 13.547697][ 0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx] > > [ 13.547701][ 0] sp : ffffb2158236bc60 > > [ 13.561388][ 0] x29: ffffb2158236bc60 x28: 0000000000000000 > > [ 13.567104][ 0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8 > > [ 13.572820][ 0] x25: ffff3be824b031e0 x24: 0000000000000028 > > [ 13.578535][ 0] x23: ffffb2158600d188 x22: ffffb21586d3ea38 > > [ 13.584251][ 0] x21: 0000000000008010 x20: ffffb21586d3ea08 > > [ 13.589968][ 0] x19: ffffb2158600d040 x18: 0000000000000400 > > [ 13.595683][ 0] x17: 0000000000000000 x16: ffff3be83f9a9500 > > [ 13.601398][ 0] x15: 0000000000000400 x14: 0000000000000400 > > [ 13.607114][ 0] x13: 0000000000000189 x12: 0000000000000001 > > [ 13.612829][ 0] x11: 0000000000000000 x10: 0000000000000b40 > > [ 13.618544][ 0] x9 : 0000000000000000 x8 : 0000000000000000 > > [ 13.624259][ 0] x7 : 0000000000000000 x6 : 000000000000003f > > [ 13.629974][ 0] x5 : 0000000000000040 x4 : 0000000000000000 > > [ 13.635689][ 0] x3 : 0000000000000004 x2 : 0000000000007fd0 > > [ 13.641404][ 0] x1 : 0000000000000000 x0 : 0000000000000000 > > [ 13.647119][ 0] Call trace: > > [ 13.649983][ 0] __memset+0x16c/0x188 > > [ 13.653718][ 0] qla2x00_do_work+0x398/0x440 [qla2xxx] > > [ 13.658920][ 0] qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx] > > [ 13.664378][ 0] process_one_work+0x1f0/0x3c8 > > [ 13.668797][ 0] worker_thread+0x48/0x4d0 > > [ 13.672871][ 0] kthread+0x128/0x130 > > [ 13.676514][ 0] ret_from_fork+0x10/0x18 > > [ 13.680503][ 0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428) > > [ 13.687027][ 0] ---[ end trace 258cdcdd74a25238 ]--- > > [ 13.692051][ 0] Kernel panic - not syncing: Fatal exception > > Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch. > > If you are able to verify then we can request for sable back port with your reported-by and tested-by tags. Yes, it did resolve my panic after backporting that commit to 4.19-stable. But I cannot apply that commit directly, in order to resolve the conflict I also backported commit: 3b1e23aacf80 ("scsi: qla2xxx: Update rscn_rcvd field to more meaningful"). a4863b16c31e ("scsi: qla2xxx: Move rport registration out of internal"). > > -- > Himanshu Madhani Oracle Linux Engineering >