On 24/04/08 09:34AM, Jonathan Cameron wrote: > On Sun, 7 Apr 2024 19:03:23 -0700 > PJ Waskiewicz <ppwaskie@xxxxxxxxxx> wrote: > > > On 24/04/07 11:28PM, Lukas Wunner wrote: > > > > Hi Lukas, > > > > > On Sun, Apr 07, 2024 at 02:05:26PM -0700, ppwaskie@xxxxxxxxxx wrote: > > > > --- a/drivers/cxl/acpi.c > > > > +++ b/drivers/cxl/acpi.c > > > > @@ -504,7 +504,7 @@ static int cxl_get_chbs(struct device *dev, struct acpi_device *hb, > > > > > > > > rc = acpi_evaluate_integer(hb->handle, METHOD_NAME__UID, NULL, &uid); > > > > if (rc != AE_OK) { > > > > - dev_err(dev, "unable to retrieve _UID\n"); > > > > + dev_err(dev, "unable to retrieve _UID. Potentially buggy BIOS\n"); > > > > return -ENOENT; > > > > } > > > > > > dev_err(dev, FW_BUG "unable to retrieve _UID\n"); > > > ^^^^^^ > > > > > > There's a macro for that. > > > > Doh...it's been awhile since I've crossed buggy BIOS's. Thanks for the > > review and comment. > > > > Updated patch: > > > > cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure > > > > From: PJ Waskiewicz <ppwaskie@xxxxxxxxxx> > > > > Currently, Type 3 CXL devices (CXL.mem) can train using host CXL > > drivers on Emerald Rapids systems. However, on some production > > systems from some vendors, a buggy BIOS exists that improperly > > populates the ACPI => PCI mappings. This leads to the cxl_acpi > > driver to fail probe when it cannot find the root port's _UID, in > > order to look up the device's CXL attributes in the CEDT. > > > > Add a bit more of a descriptive message that the lookup failure > > could be a bad BIOS, rather than just "failed." > > > > v2: Updated message to use existing FW_BUG macro > Move the change log "v2..." etc below the --- > as we don't want it in the long term git log + better to send a fresh > patch in a separate thread. Thanks, it's been awhile, and my normal (i.e. old) workflow isn't available to me just quite yet. I can fix and send a new patch, but I'll hold off and see what Dan's thoughts are after my reply to his reply. > Other than that seems reasonable to hint it is probably a bios > bug - however I wonder how many other cases we should do this for and > whether it is worth the effort of marking them all? I can confirm this was definitely a BIOS bug in this particular case. The vendor spun a quick test BIOS for us to test on an EMR and SPR host, and the _UID's were finally correct. I could successfully walk the CEDT and get to the CAPS structs I was after (link speed, bus width, etc.). I'd be fine also marking the others, but I don't have any easy way to validate that I'd hit those cases. My BIOS for this platform is only minorly broken. I suppose it could be mocked in QEMU to cause those to fail... -PJ