On Wed, 2024-05-01 at 12:54 -0500, Bjorn Helgaas wrote: > On Wed, May 01, 2024 at 08:28:22AM -0700, PJ Waskiewicz wrote: > > On Mon, 2024-04-29 at 11:35 -0700, Dan Williams wrote: > > > Bjorn Helgaas wrote: > > > > On Sun, Apr 28, 2024 at 10:57:13PM -0700, PJ Waskiewicz wrote: > > > > > On Tue, 2024-04-09 at 08:22 -0500, Bjorn Helgaas wrote: > > > > > > On Sun, Apr 07, 2024 at 02:05:26PM -0700, > > > > > > ppwaskie@xxxxxxxxxx wrote: > > > > > > > From: PJ Waskiewicz <ppwaskie@xxxxxxxxxx> > > > > > > > > > > > > > > Currently, Type 3 CXL devices (CXL.mem) can train using > > > > > > > host CXL drivers on Emerald Rapids systems. However, on > > > > > > > some production systems from some vendors, a buggy BIOS > > > > > > > exists that improperly populates the ACPI => PCI > > > > > > > mappings. > > > > > > > > > > > > Can you be more specific about what this ACPI => PCI > > > > > > mapping > > > > > > is? If you already know what the problem is, I'm sure this > > > > > > is obvious, but otherwise it's not. > > > [..] > > > > It's just a buggy BIOS that doesn't supply _UID for an ACPI0016 > > > > object, so you can't locate the corresponding CEDT entry, > > > > right? > > > > > > Correct, the problem is 100% contained to ACPI, and PCI is > > > innocent. The ACPI bug leads to failures to associate ACPI > > > host-bridge objects with CEDT.CHBS entries. > > > > Sorry for the confusion here!! I was definitely not trying to > > blame > > PCI. :) > > > > > ACPI to PCI association is then typical pci_root lookup, i.e.: > > > > > > pci_root = acpi_pci_find_root(hb->handle); > > > bridge = pci_root->bus->bridge; > > > > Yes, this here. In my use case, I'm starting with a PCIe/CXL > > device. > > In my driver, I try to discover the host bridge, and then the ACPI > > _UID > > so I can look things up in the CEDT. > > > > So I'm trying to do the programmatic equivalent of this: > > > > Start here in my PCIe/CXL host driver: > > > > /sys/devices/pci0000:37/firmware_node => > > ../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:02 > > > > Retrieve _UID (uid) from /sys/devices/pci0000:37/firmware_node/uid > > > > Buggy BIOS, that above value resolves to CX02. In fact, it > > *should* be > > 49. This is very much a bug in the ACPI arena. > > > > The kernel APIs allowing me to walk this path would fail in the > > acpi_evaluate_object() when trying to pass in the bad _UID (CX02). > > > > Again, sorry for the confusion if it looked like I was trying to > > implicate PCI in any way. The whole intent here was to leave some > > breadcrumbs so anyone else running into this wouldn't be left > > scratching their heads wondering wtf was going on. > > > No worries, I didn't suspect a PCI issue here; I just wasn't clear on > what ACPI=>PCI mapping was involved. It sounds like there *is* no > such mapping in this picture (you find the ACPI object for a PCIe/CXL > host bridge, evaluate _UID from that object, and get a bogus value). > > So the commit log text: > > However, on some production systems from some vendors, a buggy BIOS > exists that improperly populates the ACPI => PCI mappings. > > apparently refers to improper implementation of the _UID, which > doesn't return anything PCI related. Agreed. I'm happy to fix the commit message to be more accurate, if we move forward with rolling this or Dan's (better) approach to handling this. > > It also says: > > This leads to the cxl_acpi driver to fail probe when it cannot find > the root port's _UID, in order to look up the device's CXL > attributes in the CEDT. > > I *think* strictly speaking this should refer to the *host bridge's* > _UID, not the Root Port's, e.g., something like this: > > However, on some production systems from some vendors, a buggy BIOS > provides a CXL host bridge _UID that doesn't match anything in the > CEDT. Much better description. I'll roll it in. I appreciate the look-over and inputs! -PJ