[PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances

Luiz Capitulino <luizcap@xxxxxxxxxx> · Mon, 28 Nov 2022 17:08:31 +0000

Hi,

[ Marc, can you help reviewing? Esp. the first patch? ]

This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
we're seeing on AWS ARM instances where attaching an EBS volume (which is a
nvme device) to the instance after offlining CPUs causes the device to take
several minutes to show up and eventually nvme kworkers and other threads start
getting stuck.

This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.

An easy reproducer is:

1. Start an ARM instance with 32 CPUs
2. Once the instance is booted, offline all CPUs but CPU 0. Eg:
   # for i in $(seq 1 32); do chcpu -d $i; done
3. Once the CPUs are offline, attach an EBS volume
4. Watch lsblk and dmesg in the instance

Eventually, you get this stack trace:

[   71.842974] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802
[   71.843966] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff]
[   71.845149] pci 0000:00:1f.0: PME# supported from D0 D1 D2 D3hot D3cold
[   71.846694] pci 0000:00:1f.0: BAR 0: assigned [mem 0x8011c000-0x8011ffff]
[   71.848458] ACPI: \_SB_.PCI0.GSI3: Enabled at IRQ 38
[   71.850852] nvme nvme1: pci function 0000:00:1f.0
[   71.851611] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
[  135.887787] nvme nvme1: I/O 22 QID 0 timeout, completion polled
[  197.328276] nvme nvme1: I/O 23 QID 0 timeout, completion polled
[  197.329221] nvme nvme1: 1/0/0 default/read/poll queues
[  243.408619] INFO: task kworker/u64:2:275 blocked for more than 122 seconds.
[  243.409674]       Not tainted 5.15.79 #1
[  243.410270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.411389] task:kworker/u64:2   state:D stack:    0 pid:  275 ppid:     2 flags:0x00000008
[  243.412602] Workqueue: events_unbound async_run_entry_fn
[  243.413417] Call trace:
[  243.413797]  __switch_to+0x15c/0x1a4
[  243.414335]  __schedule+0x2bc/0x990
[  243.414849]  schedule+0x68/0xf8
[  243.415334]  schedule_timeout+0x184/0x340
[  243.415946]  wait_for_completion+0xc8/0x220
[  243.416543]  __flush_work.isra.43+0x240/0x2f0
[  243.417179]  flush_work+0x20/0x2c
[  243.417666]  nvme_async_probe+0x20/0x3c
[  243.418228]  async_run_entry_fn+0x3c/0x1e0
[  243.418858]  process_one_work+0x1bc/0x460
[  243.419437]  worker_thread+0x164/0x528
[  243.420030]  kthread+0x118/0x124
[  243.420517]  ret_from_fork+0x10/0x20
[  258.768771] nvme nvme1: I/O 20 QID 0 timeout, completion polled
[  320.209266] nvme nvme1: I/O 21 QID 0 timeout, completion polled

For completion, I tested the same test-case on x86 with this series applied
on 5.15.79 and 5.10.155 as well. It works as expected.

Thanks,

Marc Zyngier (4):
  genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  genirq: Always limit the affinity to online CPUs
  irqchip/gic-v3: Always trust the managed affinity provided by the core
    code
  genirq: Take the proposed affinity at face value if force==true

 drivers/irqchip/irq-gic-v3-its.c |  2 +-
 kernel/irq/manage.c              | 31 +++++++++++++++++++++++--------
 kernel/irq/msi.c                 |  7 +++++++
 3 files changed, 31 insertions(+), 9 deletions(-)

-- 
2.37.1