Re: [PATCH 2/4] acpi/ghes: Process CXL Component Events

Ira Weiny <ira.weiny@xxxxxxxxx> · Fri, 1 Mar 2024 14:05:11 -0800

Dan Williams wrote:
> Ira Weiny wrote:
> > BIOS can configure memory devices as firmware first.  This will send CXL
> > events to the firmware instead of the OS.  The firmware can then send
> > these events to the OS via UEFI.  Currently a configuration such as this
> > will trace a non standard event in the log.  Using the specific CXL
> > trace points with the additional information CXL can provide is much
                         ^^^^^^^^^^^^^^^^^^^^^^^^
			 See below.
> > more useful to users.
> > Specifically, future support can be added to CXL
> > provide the DPA to HPA mapping configured at the time of the event.
> 
> One could argue that support should have happened first and taken the
> event all the way to EDAC, so this needs to merged on faith that that
> those patches are in flight.

Sure but then you also don't get the additional information already
provided by the CXL trace points.  Reading this back I see that my use of
'Specifically' masked the fact that the CXL code already has better
support to decode these records and we want to leverage that.

> 
> > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> > format for CXL Component Events.  The format is mostly the same as the
> > CXL Common Event Record Format.  The difference is the use of a GUID in
> > the Section Type rather than a UUID as part of the event itself.
> > 
> > Add GHES support to detect CXL CPER records and call into the CXL code
> > to process the event.
> > 
> > Multiple methods were considered for the call into the CXL code.  A
> > notifier chain was considered for the callback but the complexity did
> > not justify the use case.
> 
> Not sure what this adds. If you want to talk about alternatives and
> tradeoffs, great, but that should be a comparative analysis in support
> of the chosen direction.

I'll remove it.  This was carried from the previous versions.

> 
> > Furthermore, the CXL code is required to be called from process
> > context as it needs to take a device lock so a simple callback
> > register proved difficult.  Dan Williams suggested using 2 work items
> > as an atomic way of switching between a callback being registered and
> > not.  This allows the callback to run without any locking.[1]
> > 
> > Note that a local work item is required to dump any messages seen during
> > a race between any check done in cxl_cper_post_event() and the
> > scheduling of work.  That said, no attempt is made to stop the addition
> > of messages into the kfifo because this local work item provides a hook
> > to add a local CXL CPER trace point in a future patch.
> > 
> > This new combined patch addresses the report by Dan Carpenter[2].  Thus
> > the reported by tag.
> > 
> > [1] https://lore.kernel.org/all/65d111eb87115_6c745294ac@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/
> > [2] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> > 
> > Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> > Cc: Tony Luck <tony.luck@xxxxxxxxx>
> > Cc: Borislav Petkov <bp@xxxxxxxxx>
> > Reported-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
> 
> checkpatch will whine about a missing Closes: tag.

I know.  How about Suggested-by?  I want to give Dan credit because your
revert already closed the actual bug.

[snip]

> > +
> > +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, 32);
> > +static DEFINE_SPINLOCK(cxl_cper_read_lock);
> > +static DEFINE_SPINLOCK(cxl_cper_write_lock);
> > +
> > +static cxl_cper_callback cper_callback;
> > +/* cb function dumps the records */
> > +static void cxl_cper_cb_fn(struct work_struct *work)
> > +{
> > +	struct cxl_cper_work_data wd;
> > +
> > +	while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1,
> > +			&cxl_cper_read_lock)) {
> 
> Why is this taking the lock on retrieval? The work item is
> single-threaded. The only potential contention is between
> cxl_cper_local_fn() and cxl_cper_cb_fn() collision, but that can be
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  Because of this collision, Yes.

> handled by a cancel_work_sync(&cxl_local_work) on registration to pair
> with the cancel_work_sync(&cxl_cb_work) on unregistration.

Ok yea I could use cancel_work_sync().

> 
> ...but I am not sure 2 work items are needed unless some default
> processing is going to happen in the "local" case.

I thought about merging patch 3 in this one and that would make the use of
this work function more clear.

> 
> > +		cper_callback(wd.event_type, &wd.rec);
> > +	}
> > +}
> > +static DECLARE_WORK(cxl_cb_work, cxl_cper_cb_fn);
> > +
> > +static void cxl_cper_local_fn(struct work_struct *work)
> > +{
> > +	struct cxl_cper_work_data wd;
> > +
> > +	while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1,
> > +			&cxl_cper_read_lock)) {
> 
> This just looks like open coded / less efficient kfifo_reset_out().
> 
> > +		/* drop msg */
> 
> If the proposal is to do nothing when no callback is registered then no
> need to have have cxl_local_work at all.

Patch 4 makes use of this.

> 
> > +	}
> > +}
> > +static DECLARE_WORK(cxl_local_work, cxl_cper_local_fn);
> > +
> > +/* Pointer for atomic switch of record processing */
> > +struct work_struct *cxl_cper_work = &cxl_local_work;
> > +
> > +static void cxl_cper_post_event(enum cxl_event_type event_type,
> > +				struct cxl_cper_event_rec *rec)
> > +{
> > +	struct cxl_cper_work_data wd;
> > +
> > +	if (rec->hdr.length <= sizeof(rec->hdr) ||
> > +	    rec->hdr.length > sizeof(*rec)) {
> > +		pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> > +		       rec->hdr.length);
> > +		return;
> > +	}
> > +
> > +	if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> > +		pr_err(FW_WARN "CXL CPER invalid event\n");
> > +		return;
> > +	}
> > +
> > +	wd.event_type = event_type;
> > +	memcpy(&wd.rec, rec, sizeof(wd.rec));
> 
> Unfortunate to have a double copy of the record into the stack variable
> and then again into the kfifo, but I can not immediately think of a way
> around that.

Yep.  I could not either.

> 
> > +
> > +	kfifo_in_spinlocked(&cxl_cper_fifo, &wd, 1, &cxl_cper_write_lock);
> > +	schedule_work(cxl_cper_work);
> 
> I think you don't need 2 work items if you write it this way:
> 
>         spin_lock_irqsave(&cxl_cper_write_lock, flags);
>         if (cxl_cper_work) {
>                 if (kfifo_put(&cxl_cper_fifo, &wd))
>                         schedule_work(cxl_cper_work);
>                 else
>                         pr_err_ratelimited(
>                                 "buffer overflow when queuing CXL CPER record\n");
>         }
>         spin_lock_irqrestore(&cxl_cper_write_lock, flags);
> 
> ...and then take the write lock when modifying that cxl_cper_work
> pointer between NULL and non-NULL.

I'll merge patch 4 here.  Then this will be used.

Ira