On Mon, Mar 17, 2014 at 11:18 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: > [+cc Rafael] > > On Thu, Mar 13, 2014 at 8:06 PM, Suravee Suthikulpanit > <suravee.suthikulpanit@xxxxxxx> wrote: >> On 3/12/2014 4:13 PM, Bjorn Helgaas wrote: >>>> >>>> I assume the system is fully functional even without these patches, >>>> >right? The only effect of these changes should be a performance >>>> >improvement. >> >> [Suravee] Yes, the system is fully functional except the numa information >> for PCI ethernet adapters is returning -1. > > This would normally be determined by the _PXM of the PCI host bridge > leading to the NIC. The DSDT you attached to the bug report contains > no _PXM methods at all, so we have no idea what the NUMA information > should be. > >>>> >So the choices are: >>>> > >>>> > 1) Change the BIOS to provide _PXM >>>> > 2) Change Linux with your patches >>>> > >>>> >Either way, the customer has to upgrade something. Choice 1) gets you >>>> >the performance improvement on all Linux and Windows releases, even >>>> >the ones that are already in the field. >> >> [Suravee] I agree with you on both cases and I am working with the BIOS team >> to see if they can look into this and provide fixes for the future. My only >> concern is the device _PXM information might be specific to each platform >> vendors since they are the one deciding the number of northbridge to be put >> on the systems. Also, it's quite difficult to influence HW/BIOS vendor to >> provide fixes for older platforms unless some important customers are >> complaining to them. > > Yes, you're right that _PXM describes platform topology and can't be > implemented by the CPU vendor because it doesn't know the topology of > the system where the CPUs will end up. > >>>> >Choice 2) fixes future Linux >>>> >distros and whatever old releases you can convince the distro to >>>> >backport to, and doesn't help Windows at all. It makes work for all >>>> >the distros and we (i.e., I) have to worry about maintaining this in >>>> >the future. >> >> [Suravee] Actually, this was originally reported by Redhat (please see >> https://bugzilla.redhat.com/show_bug.cgi?id=1040440), and they are waiting >> for the fixes if we decide to fix it in the upstream Linux code base. > > This RH bugzilla is private, so I can't see it. > >> [Suravee] I am not sure which table is the _PXM method supposed to be apart >> of. > > It would be in the DSDT (or an SSDT). In this case, the PCI host > bridges (the PNP0A08 devices) are described in the DSDT you posted, so > _PXM would normally be there, too. > >>> Can you collect an acpidump and complete dmesg log from a system with >>> the problem? >> >> [Suravee] Please see https://bugzilla.kernel.org/show_bug.cgi?id=72051 >> I have included SRAT, SLIT and DSDT table as part of the bug. Please let me >> know if you are looking for more information. > > One of my objections to your patches is that if we include them, it > will reinforce the perception that the BIOS is doing things correctly, > so there will be less incentive to actually fix the problem by > following the spec. This has happened several times in the past, > e.g., quirk_amd_nb_node() and amd_bus.c itself. > > Another is more of a question: as coded, your patch assumes *all* AMD > host bridge devices will have these registers at those locations > forever. Are those registers documented as being architected? > > Finally, your patch extracts a 3-bit number from a CPU register and > uses it as the node number. The node numbers used by Linux are > logical and there's no reason they need to be identical to settings in > the CPU registers. So if we got some node information in the normal > way (from _PXM, SLIT, SRAT, etc.) and some from your patch, there's no > reason to believe they would be compatible. So where do we go from here? I'm willing to merge a patch to work around this BIOS defect, but I think we need to iterate on this a little bit. 1) If these registers are publicly documented as architected for all AMD host bridges, your approach is fine. Otherwise, I think we need to continue with the existing list of devices. 2) The hardware node number space is distinct from the Linux node number space, and Linux doesn't know the mapping between them. Since the patch assumes these spaces are identical, we need to log a message indicating that we're working around a BIOS defect, and the topology information may not be reliable. Otherwise, we're setting ourselves up for nasty debugging efforts later. 3) I think we should take this opportunity to get rid of quirk_amd_nb_node(), as Myron mentioned. This doesn't help for the bug you're working around, but it's related code that seems to be redundant. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html