Re: [PATCHv2 3/5] PCI/ERR: Retain status from error notification

"Kuppuswamy, Sathyanarayanan" <sathyanarayanan.kuppuswamy@xxxxxxxxx> · Tue, 2 Mar 2021 21:46:40 -0800

On 3/2/21 9:34 PM, Williams, Dan J wrote:
[ Add Sathya ]

On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
Overwriting the frozen detected status with the result of the link reset
loses the NEED_RESET result that drivers are depending on for error
handling to report the .slot_reset() callback. Retain this status so
that subsequent error handling has the correct flow.

Reported-by: Hinko Kocevar <hinko.kocevar@xxxxxx>
Acked-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx>
Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
Just want to report that this fix might be a candidate for -stable.
Agree.

I think it can be merged in both stable and mainline kernels.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
Recent DPC error recovery testing on v5.10 shows that losing this
status prevents NVME from restarting the queue after error recovery
with a failing signature like:

[  424.685179] pcie_do_recovery: pcieport 0000:97:02.0: AER: broadcast error_detected message
[  424.685185] nvme nvme0: frozen state error detected, reset controller
[  425.078620] pcie_do_recovery: pcieport 0000:97:02.0: AER: broadcast resume message
[  425.078674] pcieport 0000:97:02.0: AER: device recovery successful

...and then later:

[  751.146277] sysrq: Show Blocked State
[  751.146431] task:touch           state:D stack:    0 pid:11969 ppid: 11413 flags:0x00000000
[  751.146434] Call Trace:
[  751.146443]  __schedule+0x31b/0x890
[  751.146445]  schedule+0x3c/0xa0
[  751.146449]  blk_queue_enter+0x151/0x220
[  751.146454]  ? finish_wait+0x80/0x80
[  751.146455]  submit_bio_noacct+0x39a/0x410
[  751.146457]  submit_bio+0x42/0x150
[  751.146460]  ? bio_add_page+0x62/0x90
[  751.146461]  ? guard_bio_eod+0x25/0x70
[  751.146465]  submit_bh_wbc+0x16d/0x190
[  751.146469]  ext4_read_bh+0x47/0xa0
[  751.146472]  ext4_read_inode_bitmap+0x3cd/0x5f0

...because NVME was only told to disable, never to reset and resume the
block queue.

I have not tested it, but by code inspection I eventually found this
upstream fix.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer