On Tue, Mar 8, 2022 at 7:51 PM Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Platforms with large BERT table data can trigger soft lockup errors > while attempting to print the entire BERT table data to the console at > boot: > > watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1] > > Observed on Ampere Altra systems with a single BERT record of ~250KB. > > The original bert driver appears to have assumed relatively small table > data. Since it is impractical to reassemble large table data from > interwoven console messages, and the table data is available in > > /sys/firmware/acpi/tables/data/BERT > > limit the size for tables printed to the console to 1024 (for no reason > other than it seemed like a good place to kick off the discussion, would > appreciate feedback from existing users in terms of what size would > maintain their current usage model). > > Alternatively, we could make printing a CONFIG option, use the > bert_disable boot arg (or something similar), or use a debug log level. > However, all those solutions require extra steps or change the existing > behavior for small table data. Limiting the size preserves existing > behavior on existing platforms with small table data, and eliminates the > soft lockups for platforms with large table data, while still making it > available. > > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> > Cc: Len Brown <lenb@xxxxxxxxxx> > Cc: James Morse <james.morse@xxxxxxx> > Cc: Tony Luck <tony.luck@xxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxxxx> > Cc: Doug Rady <dcrady@xxxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> Not that I have a particularly strong opinion here, but this looks reasonable to me, so I've queued it up for 5.18. APEI reviewers, please chime in if you disagree with the above. > --- > drivers/acpi/apei/bert.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/acpi/apei/bert.c b/drivers/acpi/apei/bert.c > index 19e50fcbf4d6..ad8ab3f12cf3 100644 > --- a/drivers/acpi/apei/bert.c > +++ b/drivers/acpi/apei/bert.c > @@ -29,6 +29,7 @@ > > #undef pr_fmt > #define pr_fmt(fmt) "BERT: " fmt > +#define ACPI_BERT_PRINT_MAX_LEN 1024 > > static int bert_disable; > > @@ -58,8 +59,11 @@ static void __init bert_print_all(struct acpi_bert_region *region, > } > > pr_info_once("Error records from previous boot:\n"); > - > - cper_estatus_print(KERN_INFO HW_ERR, estatus); > + if (region_len < ACPI_BERT_PRINT_MAX_LEN) > + cper_estatus_print(KERN_INFO HW_ERR, estatus); > + else > + pr_info_once("Max print length exceeded, table data is available at:\n" > + "/sys/firmware/acpi/tables/data/BERT"); > > /* > * Because the boot error source is "one-time polled" type, > -- > 2.31.1 >