On Wed, Jan 29, 2025 at 5:33 AM Nathan Chancellor <nathan@xxxxxxxxxx> wrote: > > On Thu, Jan 23, 2025 at 08:35:51PM +0100, Rafael J. Wysocki wrote: > > On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@xxxxxxxxxx> wrote: > > > > > > > > > 在 2025/1/20 19:04, Jonathan Cameron 写道: > > > > On Fri, 17 Jan 2025 10:29:57 +0800 > > > > Xiaofei Tan <tanxiaofei@xxxxxxxxxx> wrote: > > > > > > > >> When the module HED is built-in, the module HED init is behind EVGED > > > >> as the driver are in the same initcall level, then the order is determined > > > >> by Makefile order. That order violates expectations. Because RAS records > > > >> can't be handled in the special time window that EVGED has initialized > > > >> while HED not. > > > >> > > > >> If the number of such RAS records is more than the APEI HEST error source > > > >> number, the HEST resources could be occupied all, and then could affect > > > >> subsequent RAS error reporting. > > > >> > > > >> Change the initcall level of HED to subsys_init to fix the issue. If build > > > >> HED as a module, the problem remains. To solve this problem completely, > > > >> change the ACPI_HED from tristate to bool. > > > >> > > > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > > > Given the change in approach (even though I reviewed this internally) > > > > should probably have dropped my RB. Anyhow, consider this me > > > > giving it again on list. > > > OK. thanks. > > > > Applied as 6.14-rc material with a rewritten changelog and under a new > > subject: "ACPI: HED: Always initialize before evged". > > > > Thanks! > > For what it's worth, I just bisected a new error message that I see when > booting several x86_64 distribution configurations in QEMU to this > change in -next as commit 19badc4e57c6 ("ACPI: HED: Always initialize > before evged"): > > $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config > > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage > > $ qemu-system-x86_64 \ > -display none \ > -nodefaults \ > -M q35 \ > -d unimp,guest_errors \ > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ > -kernel arch/x86/boot/bzImage \ > -initrd rootfs.cpio \ > -cpu host \ > -enable-kvm \ > -m 512m \ > -smp 8 > -serial mon:stdio > ... > [ 0.535126] Error: Driver 'hardware_error_device' is already registered, aborting... > ... > > If there is any additional information I can provide or patches I can > test, I am more than happy to do so. Apologies if this has already been > reported or resolved, I did a search on the mailing list and I did not > see anything. No, it hasn't. So AFAICS the commit in question needs to do more to switch over hed to non-modular. I'll drop it for now, thanks!