RE: [PATCH v9 1/2] ACPI / APEI: Add support to notify the vendor specific HW errors

Shiju Jose <shiju.jose@xxxxxxxxxx> · Fri, 19 Jun 2020 13:42:04 +0000

Hi James,

Thanks for reviewing the patch and the modifications.

>-----Original Message-----
>From: James Morse [mailto:james.morse@xxxxxxx]
>Sent: 18 June 2020 19:20
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; rjw@xxxxxxxxxxxxx; bp@xxxxxxxxx; lenb@xxxxxxxxxx;
>tony.luck@xxxxxxxxx; dan.carpenter@xxxxxxxxxx;
>zhangliguang@xxxxxxxxxxxxxxxxx; andriy.shevchenko@xxxxxxxxxxxxxxx;
>Wangkefeng (OS Kernel Lab) <wangkefeng.wang@xxxxxxxxxx>;
>jroedel@xxxxxxx; yangyicong <yangyicong@xxxxxxxxxx>; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>
>Subject: Re: [PATCH v9 1/2] ACPI / APEI: Add support to notify the vendor
>specific HW errors
>
>Hi Shiju,
>
>On 15/06/2020 10:53, Shiju Jose wrote:
>> Add support to notify the vendor specific non-fatal HW errors to the
>> drivers for the error recovery.
>
>This doesn't apply cleanly to v5.8-rc1... thanks for waiting for the merge
>window to finish, but please rebase onto the latest and greatest kernel!

V10 was posted based on v5.8-rc1.
>
>I'm glad the notifier chains for stuff that should be built-in has gone.
>(In my opinion, the RAS code should be moving in the direction of having less
>code run between being told of an error, and the handler running. Notifier
>chains for things like memory-errors was moving in the wrong direction!)
>
>
>The Kfifo and pool are adding complexity I don't think you need.
>Please make it clear from the naming this is for vendor records. (what is an
>event?)
>
>The memcpy() for the records is annoying, but eliminating it takes some
>really invasive changes. Lets live with it for now.
Ok.

>
>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>> 24c9642e8fc7..854d8115cdfc 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -63,6 +64,11 @@
>>  #define GHES_ESTATUS_CACHES_SIZE	4
>>
>>  #define GHES_ESTATUS_IN_CACHE_MAX_NSEC	10000000000ULL
>> +
>> +#define GHES_EVENT_RING_SIZE	256
>> +#define GHES_GDATA_POOL_MIN_ALLOC_ORDER	3
>> +#define GHES_GDATA_POOL_MIN_SIZE	65536
>
>Huh. Another pool of memory, and we don't know if this will ever be used.
>Can we allocate from ghes_estatus_pool instead?
>
>ghes_estatus_pool is already scaled with the number of error sources
>firmware describes in ghes_estatus_pool_init(), so it should be big enough.
>
>ghes_estatus_pool already has multiple users, estatus_nodes for work
>deferred from NMI come from here, as do ghes_estatus_caches for the low-
>pass filter thing.

Ok.
>
>
>> @@ -122,6 +128,19 @@ static DEFINE_MUTEX(ghes_list_mutex);
>>   */
>>  static DEFINE_SPINLOCK(ghes_notify_lock_irq);
>>
>> +struct ghes_event_entry {
>
>ghes_vendor_record_entry ?
>
>> +	struct acpi_hest_generic_data *gdata;
>> +	int error_severity;
>> +};
>
>> +static DEFINE_KFIFO(ghes_event_ring, struct ghes_event_entry,
>> +		    GHES_EVENT_RING_SIZE);
>> +
>> +static DEFINE_SPINLOCK(ghes_event_ring_lock);
>
>Do you need the FIFO behaviour?
>If you put a work_struct in the struct and schedule_work() that, these would
>run in any order, and it would be less code.
>
>
>> +static struct gen_pool *ghes_gdata_pool; static unsigned long
>> +ghes_gdata_pool_size_request;
>> +
>>  static struct gen_pool *ghes_estatus_pool;  static unsigned long
>> ghes_estatus_pool_size_request;
>
>Please use the existing ghes_estatus_pool.
>
>
>> @@ -188,6 +207,40 @@ int ghes_estatus_pool_init(int num_ghes)
>
>[...]
>
>> +static int ghes_gdata_pool_init(void) {
>> +	unsigned long addr, len;
>> +	int rc;
>> +
>> +	ghes_gdata_pool =
>gen_pool_create(GHES_GDATA_POOL_MIN_ALLOC_ORDER, -1);
>> +	if (!ghes_gdata_pool)
>> +		return -ENOMEM;
>> +
>> +	if (ghes_gdata_pool_size_request < GHES_GDATA_POOL_MIN_SIZE)
>> +		ghes_gdata_pool_size_request =
>GHES_GDATA_POOL_MIN_SIZE;
>> +
>> +	len = ghes_gdata_pool_size_request;
>> +	addr = (unsigned long)vmalloc(PAGE_ALIGN(len));
>> +	if (!addr)
>> +		goto err_pool_alloc;
>
>> +	vmalloc_sync_mappings();
>(This isn't needed anymore. See commit 73f693c3a705 ("mm: remove
>vmalloc_sync_(un)mappings()"))
>
>
>> +	rc = gen_pool_add(ghes_gdata_pool, addr, PAGE_ALIGN(len), -1);
>> +	if (rc)
>> +		goto err_pool_add;
>> +
>> +	return 0;
>> +
>> +err_pool_add:
>> +	vfree((void *)addr);
>> +
>> +err_pool_alloc:
>> +	gen_pool_destroy(ghes_gdata_pool);
>> +
>> +	return -ENOMEM;
>> +}
>
>But: using ghes_estatus_pool would avoid this duplication.
>
>
>> @@ -247,6 +300,10 @@ static struct ghes *ghes_new(struct
>acpi_hest_generic *generic)
>>  		goto err_unmap_status_addr;
>>  	}
>>
>> +	ghes_gdata_pool_size_request += generic->records_to_preallocate *
>> +					generic->max_sections_per_record *
>> +					generic->max_raw_data_length;
>> +
>
>Careful, I think ghes_probe() can run in parallel on different CPUs. You can
>certainly unbind/rebind it from user-space.
>
>I recall these max this/that/preallocate stuff are junk values on some
>platform.
>You'd at least need to cap it to sane maximum value.
>
>But: Using ghes_estatus_pool would use ghes_estatus_pool_init()'s sizes,
>which allocates 64K for each error source.
>
>History: https://www.spinics.net/lists/linux-acpi/msg84238.html
>
>
>> @@ -490,6 +547,68 @@ static void ghes_handle_aer(struct
>> acpi_hest_generic_data *gdata)
>
>[...]
>
>> +static void ghes_event_work_func(struct work_struct *work) {
>> +	struct ghes_event_entry entry;
>> +	u32 len;
>> +
>> +	while (kfifo_get(&ghes_event_ring, &entry)) {
>> +		blocking_notifier_call_chain(&ghes_event_notify_list,
>> +					     entry.error_severity,
>> +					     entry.gdata);
>> +		len = acpi_hest_get_record_size(entry.gdata);
>> +		gen_pool_free(ghes_gdata_pool, (unsigned long)entry.gdata,
>len);
>> +	}
>> +}
>> +
>> +static DECLARE_WORK(ghes_event_work, ghes_event_work_func);
>> +
>> +static void ghes_handle_non_standard_event(struct
>acpi_hest_generic_data *gdata,
>> +					   int sev)
>> +{
>> +	u32 len;
>
>> +	struct ghes_event_entry event_entry;
>
>> +	len = acpi_hest_get_record_size(gdata);
>> +	event_entry.gdata = (void *)gen_pool_alloc(ghes_gdata_pool, len);
>> +	if (event_entry.gdata) {
>> +		memcpy(event_entry.gdata, gdata, len);
>> +		event_entry.error_severity = sev;
>> +
>> +		if (kfifo_in_spinlocked(&ghes_event_ring, &event_entry, 1,
>
>... event_entry is on the stack ...
>
Ok.

>
>> +					&ghes_event_ring_lock))
>> +			schedule_work(&ghes_event_work);
>> +		else
>> +			pr_warn(GHES_PFX "ghes event queue full\n");
>> +	}
>> +}
>
>
>I think the kfifo is adding un-needed complexity here.
Ok.
>
>
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index
>> e3f1cddb4ac8..a3dd82069069 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -50,6 +50,34 @@ enum {
>
[...]
>> +void ghes_unregister_event_notifier(struct notifier_block *nb); #else
>
>Please make it clear from the names these are for vendor events, that the
>kernel would otherwise ignore. It looks like these are for everything. Drivers
>have no business trying to handle the errors that are handled by things like
>memory_failure().
>
>~
>
>I would post a version of this to illustrate, but there are comments on patch 2
>too.
>
>Something like:
>http://www.linux-arm.org/git?p=linux-
>jm.git;a=commitdiff;h=9c6859f3146001cd9f8edfaf965232cb99c7dc42
>
>(caveat emptor: I've only build tested it)

I tested your changes and worked fine.
Should I send this patch along with the updated patch 2?

>
>
>Thanks,
>
>James

Thanks,
Shiju