On Tue, 20 Nov 2018 13:19:35 +0000 Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> wrote: > On Tue, 20 Nov 2018 13:01:05 +0100 > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > On Thu, Nov 15, 2018 at 11:06:17AM +0000, Jonathan Cameron wrote: > > > The addition of support to read the numa node for a PCI > > > card specified by _PXM resulted in Martin's system not > > > booting. Looking at the ACPI tables it seems that there > > > are PXM entries for the root ports, but no SRAT table. > > > > > > The absence of SRAT table results in dummy_numa_init being > > > called. However, unlike on arm64, this doesn't then result > > > in numa_off being set. When the PCI code later comes along > > > and calls acpi_get_node for any PCI card below the root port, > > > it navigates up the ACPI tree until it finds the PXM value in > > > the root port. This value is then passed to > > > acpi_map_pxm_to_node. If numa_off is set this returns, > > > NUMA_NO_NODE (as it does on arm64), on x86 it instead tries > > > to allocate a numa node from the unused set without setting > > > up all the infrastructure that would normally accompany such > > > a call. We have not identified exactly which driver is > > > causing the subsequent hang for Martin. > > > > > > It is invalid under the ACPI spec to specify new > > > numa nodes using PXM if they have no presence in SRAT. > > > Thus the simplest fix is to set numa_off when it is off due > > > to an invalid SRAT (here not present at all). > > > > > > I do not have easy access to appropriate x86 numa systems so > > > would appreciate some testing of this one! > > > > > > Known problem boards setups: > > > > > > AMD Ryzen Threadripper 2950X on ASROCK X399 TAICHI > > > MSI X399 SLI PLUS (probably - not confirmed yet) > > > > > > The PCI patch has been reverted, so this fix is not critical. > > > > > > Reported-by: Martin Hundeb?ll <martin@xxxxxxxxxx> > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > > Fixes: bad7dcd94f39 ("ACPI/PCI: Pay attention to device-specific _PXM node values") > > > > > > --- > > > arch/x86/mm/numa.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > > > index 1308f5408bf7..ce1182f953ff 100644 > > > --- a/arch/x86/mm/numa.c > > > +++ b/arch/x86/mm/numa.c > > > @@ -695,6 +695,8 @@ static int __init dummy_numa_init(void) > > > node_set(0, numa_nodes_parsed); > > > numa_add_memblk(0, 0, PFN_PHYS(max_pfn)); > > > > > > + numa_off = true; > > > > Should we not: > > > > pr_err(FW_BUG "Invalid SRAT table.\n"); > > > > or something along those lines? > > > > We should take every possibility to call out broken and non-compliant > > firmware. > > While I agree we should definitely be calling this out nice and loud, > not having an SRAT isn't indicating a broken firmware as SRAT is optional > in ACPI. The breakage only occurs much later when the DSDT contains a > PXM entry that doesn't correspond to any entries in SRAT. > > To report at that point, we would need some background info stashed > on why numa_off was set. It could be off because we set it so via > the kernel command line. > > If people are happy with this general direction I'm happy to spin a > patch to do that state tracking and pr_err as a follow up. > Just a quick *bump* on this one as I'd really like to get the PCIe patch that caused the issue back in for the next merge window if possible. There is some related new ACPI spec stuff coming shortly that will be making more use of this. Martin tested the basic principle (if not necessarily exactly this patch) with success. Ingo, did you have a chance to take a look at this for your test system? To my mind it seems like a very straight forward fix, with a small potential that someone is dong something 'interesting' that would stop working due to us blocking the domain lookup when SRAT isn't present. Thanks, Jonathan