Re: qla2xxx panic with 4.19-stable

Himanshu Madhani <himanshu.madhani@xxxxxxxxxx> · Fri, 11 Sep 2020 12:37:23 -0500

Hi,

> On Sep 10, 2020, at 9:26 PM, Zhengyuan Liu <liuzhengyuang521@xxxxxxxxx> wrote:
> 
> Hi,
> 
> There is a panic of NULL pointer dereference on my arm64 server when
> boot  with the fabric line  plugged into the HBA of QLE2692. After
> binary-search with git bisect I found this panic is introduced by
> commit 4984a06bf094 ("scsi: qla2xxx: Remove all rports if fabric scan
> retry fails"). The upstream and 4.19-stable both had the same problem
> when reset to this point. but the upstream had fix this
> unintentionally after commit da61ef053bcf ("scsi: qla2xxx: Reduce
> holding sess_lock to prevent CPU") while the latest 4.19-stable still
> has this issue. the panic showed as following:
> 
> [   13.380405][  0] Unable to handle kernel NULL pointer dereference
> at virtual address 0000000000000000
> [   13.390947][  0] Mem abort info:
> [   13.395535][  0]   ESR = 0x96000045
> [   13.400390][  0]   Exception class = DABT (current EL), IL = 32 bits
> [   13.408089][  0]   SET = 0, FnV = 0
> .
> [   13.412941][  0]   EA = 0, S1PTW = 0
> [   13.416747][  0] Data abort info:
> [   13.420048][  0]   ISV = 0, ISS = 0x00000045
> [   13.424293][  0]   CM = 0, WnR = 1
> [   13.427676][  0] user pgtable: 64k pages, 48-bit VAs, pgdp = (____ptrval____)
> [   13.434778][  0] [0000000000000000] pgd=0000000000000000,
> pud=0000000000000000
> [   13.441968][  0] Internal error: Oops: 96000045 [#1] SMP
> [   13.447250][  0] Modules linked in: qla2xxx nvme_fc nvme_fabrics
> scsi_transport_fc igb megaraid_sas dm_snapshot iscsi_tcp libiscsi_tcp
> libs
> [   13.472588][  0] Process kworker/0:2 (pid: 343, stack limit =
> 0x(____ptrval____))
> [   13.472675][  5] audit: type=1130 audit(1599118767.260:14): pid=1
> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> comm="sy'
> [   13.480032][  0] CPU: 0 PID: 343 Comm: kworker/0:2 Tainted: G
> W         4.19.90-19.ky10.aarch64 #1
> [   13.480033][  5] Hardware name: GreatWall, BIOS 601FBE28 2020/04/20
> [   13.480045][  0] Workqueue: qla2xxx_wq qla2x00_iocb_work_fn [qla2xxx]
> [   13.499248][  0] audit: type=1131 audit(1599118767.260:15): pid=1
> uid=0 auid=4294967295 ses=4294967295 msg='unit=initrd-parse-etc
> comm="sy'
> [   13.508759][  0] pstate: 40000005 (nZcv daif -PAN -UAO)
> [   13.547687][ 24] pc : __memset+0x16c/0x188
> [   13.547697][  0] lr : qla24xx_async_gpnft+0x194/0x950 [qla2xxx]
> [   13.547701][  0] sp : ffffb2158236bc60
> [   13.561388][  0] x29: ffffb2158236bc60 x28: 0000000000000000
> [   13.567104][  0] x27: ffff3be824ac0148 x26: ffff3be824ac00b8
> [   13.572820][  0] x25: ffff3be824b031e0 x24: 0000000000000028
> [   13.578535][  0] x23: ffffb2158600d188 x22: ffffb21586d3ea38
> [   13.584251][  0] x21: 0000000000008010 x20: ffffb21586d3ea08
> [   13.589968][  0] x19: ffffb2158600d040 x18: 0000000000000400
> [   13.595683][  0] x17: 0000000000000000 x16: ffff3be83f9a9500
> [   13.601398][  0] x15: 0000000000000400 x14: 0000000000000400
> [   13.607114][  0] x13: 0000000000000189 x12: 0000000000000001
> [   13.612829][  0] x11: 0000000000000000 x10: 0000000000000b40
> [   13.618544][  0] x9 : 0000000000000000 x8 : 0000000000000000
> [   13.624259][  0] x7 : 0000000000000000 x6 : 000000000000003f
> [   13.629974][  0] x5 : 0000000000000040 x4 : 0000000000000000
> [   13.635689][  0] x3 : 0000000000000004 x2 : 0000000000007fd0
> [   13.641404][  0] x1 : 0000000000000000 x0 : 0000000000000000
> [   13.647119][  0] Call trace:
> [   13.649983][  0]  __memset+0x16c/0x188
> [   13.653718][  0]  qla2x00_do_work+0x398/0x440 [qla2xxx]
> [   13.658920][  0]  qla2x00_iocb_work_fn+0x50/0xe8 [qla2xxx]
> [   13.664378][  0]  process_one_work+0x1f0/0x3c8
> [   13.668797][  0]  worker_thread+0x48/0x4d0
> [   13.672871][  0]  kthread+0x128/0x130
> [   13.676514][  0]  ret_from_fork+0x10/0x18
> [   13.680503][  0] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> [   13.687027][  0] ---[ end trace 258cdcdd74a25238 ]---
> [   13.692051][  0] Kernel panic - not syncing: Fatal exception

Have you tried applying commit da61ef053bcf ("scsi: qla2xxx: Reduce holding sess_lock to prevent CPU”) to confirm if it resolves your panic. It does look like the panic should resolve with the changes in that patch.

If you are able to verify then we can request for sable back port with your reported-by and tested-by tags. 

--
Himanshu Madhani	 Oracle Linux Engineering