Re: [PATH stable 5.15, 5.10 0/4] Fix EBS volume attach on AWS ARM instances

Luiz Capitulino <luizcap@xxxxxxxxxx> · Tue, 29 Nov 2022 22:12:32 -0500

On 2022-11-28 13:27, Luiz Capitulino wrote:

On 2022-11-28 12:53, Marc Zyngier wrote:
On Mon, 28 Nov 2022 17:08:31 +0000,
Luiz Capitulino <luizcap@xxxxxxxxxx> wrote:
Hi,

[ Marc, can you help reviewing? Esp. the first patch? ]

This series of backports from upstream to stable 5.15 and 5.10 fixes 
an issue
we're seeing on AWS ARM instances where attaching an EBS volume 
(which is a
nvme device) to the instance after offlining CPUs causes the device 
to take
several minutes to show up and eventually nvme kworkers and other 
threads start
getting stuck.

This series fixes the issue for 5.15.79 and 5.10.155. I can't 
reproduce it
on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
That's because x86 has a very different allocation policy compared to
what the ITS does. The x86 vector space is tiny, so vectors are only
allocated when required. In your case, that's when the CPUs are
onlined.

With the ITS, all the vectors are allocated upfront, as this is
essentially free. But in the case of managed interrupts, these vectors
are now pointing to offline CPUs. The ITS tries to fix that, but
doesn't nearly have enough information. And the correct course of
action is to keep these interrupts in the shutdown state, which is
what the series is doing.

Thank you for the explanation, Marc. I also immensely

appreciate the super fast response! (more below).

An easy reproducer is:

1. Start an ARM instance with 32 CPUs
To satisfy my own curiosity, is that in a guest or bare metal? It
shouldn't make any difference, but hey...

This is a guest. I'll test on a bare-metal instance, it may

take a few hours. I'll reply here.

I was able to test this on a bare-metal instance on both arm64 and x86 
with and without this series. It all works as expected.

The only difference in that on the arm64 bare-metal instance, I get
a PCI error on an unfixed kernel (below) and the system never hangs
(whereas on a guest, I get no PCI error and eventually threads start
hanging).

This series fixes this case too and the device is added as expected
on a fixed kernel.

So, all seems good!

[  162.618277] pcieport 0000:14:06.0: bridge window [io  0x1000-0x0fff] 
to [bus 1b] add_size 1000
[  162.618905] pcieport 0000:14:06.0: BAR 13: no space for [io  size 0x1000]
[  162.619398] pcieport 0000:14:06.0: BAR 13: failed to assign [io  size 
0x1000]
[  162.619916] pcieport 0000:14:06.0: BAR 13: no space for [io  size 0x1000]
[  162.620410] pcieport 0000:14:06.0: BAR 13: failed to assign [io  size 
0x1000]
[  162.620929] pci 0000:1b:00.0: BAR 0: assigned [mem 
0x83200000-0x833fffff 64bit]
[  162.621472] pcieport 0000:14:06.0: PCI bridge to [bus 1b]
[  162.621872] pcieport 0000:14:06.0:   bridge window [mem 
0x83200000-0x833fffff]
[  162.622398] pcieport 0000:14:06.0:   bridge window [mem 
0x18019000000-0x18019ffffff 64bit pref]
[  162.623411] nvme 0000:1b:00.0: Adding to iommu group 56
[  162.624081] nvme nvme2: pci function 0000:1b:00.0
[  162.624455] nvme 0000:1b:00.0: enabling device (0000 -> 0002)
[  162.627776] nvme nvme2: Removing after probe failure status: -5
[  187.396805] nvme nvme1: I/O 3 QID 0 timeout, reset controller
[  187.399390] nvme nvme1: Identify namespace failed (-4)
[  187.429068] nvme nvme1: Removing after probe failure status: -5

Anyway, patch #1 looks OK to me, but I haven't tried to dig further
into something that is "oh so last year" ;-). Specially as we're
rewriting the whole of the MSI stack! FWIW:

Acked-by: Marc Zyngier <maz@xxxxxxxxxx>

Thank you again, Marc!

         M.

--
Without deviation from the norm, progress is not possible.