On Thu, Mar 6, 2014 at 1:03 PM, Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx> wrote: > On 3/6/2014 11:40 AM, Bjorn Helgaas wrote: >> >> [+cc Yinghai, sorry I didn't think of it before] >> >> On Wed, Mar 5, 2014 at 11:30 PM, Suravee Suthikulpanit >> <suravee.suthikulpanit@xxxxxxx> wrote: >>> >>> On 3/5/2014 8:13 PM, Suravee Suthikulanit wrote: >>>> >>>> >>>> On 3/5/2014 3:24 PM, Bjorn Helgaas wrote: >>>>> >>>>> >>>>> [+cc linux-acpi] >>>>> >>>>> On Wed, Mar 5, 2014 at 2:06 PM, <suravee.suthikulpanit@xxxxxxx> wrote: >>>>>> >>>>>> >>>>>> From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx> >>>>>> >>>>>> The current code only supports upto AMD hostbridge for family11h. >>>>>> This causes PCI numa_node information to be reported incorrectly >>>>>> for newer family with multi sockets. >>>>> >>>>> >>>>> >>>>> Where is the incorrect reporting? In ACPI tables? Is this patch a >>>>> way to cover up firmware defects in the ACPI description? Or is this >>>>> for machines without ACPI (it seems unlikely that machines with new >>>>> AMD processors would not have ACPI)? >>>> >>>> >>>> >>>> This is incorrectly reported in the sysfs for each PCI device (e.g. >>>> /devices/pci0000:50/0000:50:00.2/numa_node). Without the patch, they >>>> return -1. >>>> >>>> In file arch/x86/pci/acpi.c, in function pci_acpi_scan_root(), it is >>>> queries the node information as following: >>>> >>>> #ifdef CONFIG_ACPI_NUMA >>>> pxm = acpi_get_pxm(device->handle); >>>> if (pxm >= 0) >>>> node = pxm_to_node(pxm); >>>> if (node != -1) >>>> set_mp_bus_to_node(busnum, node); >>>> else >>>> #endif >>>> node = get_mp_bus_to_node(busnum); >>>> >>>> In this case, I see that the acpi_get_pxm() returns -1. Therefore, it >>>> falls back to using the node information in mp_bus_to_node[]. So, >>>> without this patch, it would also returning -1. >>>> >>>> Also, the spec mentioned that the _PXM is optional, so I am not sure if >>>> this is a firmware bug. >>> >>> >>> I am not quite familiar with the ACPI for this part. However, after >>> taking >>> a look at the code (in driver/acpi/pci_root.c: acpi_pci_root_add()), I >>> believe it's trying to locate _PXM method in the DSDT table, in which I >>> don't see any _PXM methods. >> >> >> This sure looks like a firmware bug. True, _PXM is optional, but if >> the firmware doesn't provide it, nobody should be surprised that the >> OS thinks everything is in the same proximity domain. >> >> I would not endorse extending amd_bus.c for new CPUs. That just >> covers up firmware problems like this, and if you ever run a different >> OS on the box, you'll trip over them again. And I don't think a patch >> like this will even be a possibility for Windows. > > I understand and am trying to verify this with the BIOS engineers. However, > this is currently affecting family15h servers out in the field. We can try > to fix ACPI for newer generation of machines, but it won't be practical to > push this BIOS fix to all the BIOS vendors and system vendors for older > platforms, as they tend to. > > What if I localize the extension to the changes to access node information > in the hostbridge for just the famil15h which is mostly used in our main > server products? Would that be acceptable? I assume the system is fully functional even without these patches, right? The only effect of these changes should be a performance improvement. So the choices are: 1) Change the BIOS to provide _PXM 2) Change Linux with your patches Either way, the customer has to upgrade something. Choice 1) gets you the performance improvement on all Linux and Windows releases, even the ones that are already in the field. Choice 2) fixes future Linux distros and whatever old releases you can convince the distro to backport to, and doesn't help Windows at all. It makes work for all the distros and we (i.e., I) have to worry about maintaining this in the future. I'm curious to see what your BIOS folks say. It seems strange that after all these years, they wouldn't provide something relatively simple like _PXM, so I wonder if there's more to the story. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html