Re: [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/04/08 09:34AM, Jonathan Cameron wrote:
> On Sun, 7 Apr 2024 19:03:23 -0700
> PJ Waskiewicz <ppwaskie@xxxxxxxxxx> wrote:
> 
> > On 24/04/07 11:28PM, Lukas Wunner wrote:
> > 
> > Hi Lukas,
> > 
> > > On Sun, Apr 07, 2024 at 02:05:26PM -0700, ppwaskie@xxxxxxxxxx wrote:  
> > > > --- a/drivers/cxl/acpi.c
> > > > +++ b/drivers/cxl/acpi.c
> > > > @@ -504,7 +504,7 @@ static int cxl_get_chbs(struct device *dev, struct acpi_device *hb,
> > > >  
> > > >  	rc = acpi_evaluate_integer(hb->handle, METHOD_NAME__UID, NULL, &uid);
> > > >  	if (rc != AE_OK) {
> > > > -		dev_err(dev, "unable to retrieve _UID\n");
> > > > +		dev_err(dev, "unable to retrieve _UID. Potentially buggy BIOS\n");
> > > >  		return -ENOENT;
> > > >  	}  
> > > 
> > > dev_err(dev, FW_BUG "unable to retrieve _UID\n");
> > >              ^^^^^^
> > > 
> > > There's a macro for that.  
> > 
> > Doh...it's been awhile since I've crossed buggy BIOS's.  Thanks for the
> > review and comment.
> > 
> > Updated patch:
> > 
> > cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure
> > 
> > From: PJ Waskiewicz <ppwaskie@xxxxxxxxxx>
> > 
> > Currently, Type 3 CXL devices (CXL.mem) can train using host CXL
> > drivers on Emerald Rapids systems.  However, on some production
> > systems from some vendors, a buggy BIOS exists that improperly
> > populates the ACPI => PCI mappings.  This leads to the cxl_acpi
> > driver to fail probe when it cannot find the root port's _UID, in
> > order to look up the device's CXL attributes in the CEDT.
> > 
> > Add a bit more of a descriptive message that the lookup failure
> > could be a bad BIOS, rather than just "failed."
> > 
> > v2: Updated message to use existing FW_BUG macro
> Move the change log "v2..." etc below the ---
> as we don't want it in the long term git log + better to send a fresh
> patch in a separate thread.

Thanks, it's been awhile, and my normal (i.e. old) workflow isn't
available to me just quite yet.  I can fix and send a new patch, but
I'll hold off and see what Dan's thoughts are after my reply to his
reply.

> Other than that seems reasonable to hint it is probably a bios
> bug - however I wonder how many other cases we should do this for and
> whether it is worth the effort of marking them all?

I can confirm this was definitely a BIOS bug in this particular case.
The vendor spun a quick test BIOS for us to test on an EMR and SPR host,
and the _UID's were finally correct.  I could successfully walk the CEDT
and get to the CAPS structs I was after (link speed, bus width, etc.).

I'd be fine also marking the others, but I don't have any easy way to
validate that I'd hit those cases.  My BIOS for this platform is only
minorly broken.  I suppose it could be mocked in QEMU to cause those to
fail...

-PJ




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux