On Sat Dec 21, 2024 at 10:11 PM EET, Jarkko Sakkinen wrote: > On Sat Dec 21, 2024 at 7:16 PM EET, James Bottomley wrote: > > On Sat, 2024-12-21 at 17:04 +0100, Ard Biesheuvel wrote: > > > On Sat, 21 Dec 2024 at 12:33, Jarkko Sakkinen <jarkko@xxxxxxxxxx> > > > wrote: > > > > > > > > The following failure was reported: > > > > > > > > [ 10.693310][ T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, > > > > rev-id 0) > > > > [ 10.848132][ T1] ------------[ cut here ]------------ > > > > [ 10.853559][ T1] WARNING: CPU: 59 PID: 1 at > > > > mm/page_alloc.c:4727 __alloc_pages_noprof+0x2ca/0x330 > > > > [ 10.862827][ T1] Modules linked in: > > > > [ 10.866671][ T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not > > > > tainted 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed > > > > (unreleased) 588cd98293a7c9eba9013378d807364c088c9375 > > > > [ 10.882741][ T1] Hardware name: HPE ProLiant DL320 > > > > Gen12/ProLiant DL320 Gen12, BIOS 1.20 10/28/2024 > > > > [ 10.892170][ T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330 > > > > [ 10.898103][ T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 > > > > 88 fe ff ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 > > > > c6 05 f8 e6 ce 01 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 > > > > 00 75 42 89 d9 80 e1 > > > > [ 10.917750][ T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246 > > > > [ 10.923777][ T1] RAX: 0000000000000000 RBX: 0000000000040cc0 > > > > RCX: 0000000000000000 > > > > [ 10.931727][ T1] RDX: 0000000000000000 RSI: 000000000000000c > > > > RDI: 0000000000040cc0 > > > > > > > > Above shows that ACPI pointed a 16 MiB buffer for the log events > > > > because RSI maps to the 'order' parameter of > > > > __alloc_pages_noprof(). Address the bug by mapping the region when > > > > needed instead of copying. > > > > > > > > Reported-by: Andy Liang <andy.liang@xxxxxxx> > > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219495 > > > > Suggested-by: Matthew Garrett <mjg59@xxxxxxxxxxxxx> > > > > Signed-off-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > > > > > > This is a very intrusive fix - care to provide some more context on > > > why all these changes are needed? > > > > Since the bug reports never found an actual log over a few tens of > > kilobytes this is caused by the BIOS implementation allocating a huge > > buffer that is mostly unused. > > > > There are two other possibilities for fixing this, which were both part > > of the original suggestions. One would be to work out the size of the > > log and then allocate an exact size. This would require implementing > > tpm1 and tpm2 parsers for log size. However, since we can never go > > over KMALLOC_MAX_SIZE without an error even with this calculated size, > > the simplest straight line fix would be to cap the copy at > > KMALLOC_MAX_SIZE if it's over. That would be a simple one liner. > > All I'm saying is this. > > I've got bunch of complains of this from mainly SUSE, and now I'm > here with a response to that feedback. So I don't care. You decide. > > I'm 100% sure that the fix that Stefan proposed is not a sustainable > path in long-term, so I guess this was more like more long-term but > intrusive fix. > > Ya, and also please test the changes, especially anything that can > reach of OF eventlogs would be welcome feedback. Note that I precision cut that transcript because it includes all infos of the debatable hardware. BR, Jarkko