Hi, Tony, Thank you for your quick reply. 在 2022/3/17 AM1:29, Luck, Tony 写道: > On Tue, Mar 08, 2022 at 01:19:12PM +0800, Shuai Xue wrote: >> Hi folks, >> >> If we inject an memory error at physical memory address, e.g. 0x92f033038, >> used by a user space process: >> >> echo 0x92f033038 > /sys/kernel/debug/apei/einj/param1 >> echo 0xfffffffffffff000 > /sys/kernel/debug/apei/einj/param2 >> echo 0x1 > /sys/kernel/debug/apei/einj/flags >> echo 0x8 > /sys/kernel/debug/apei/einj/error_type >> echo 1 > /sys/kernel/debug/apei/einj/error_inject >> >> Then the following error will be reported in dmesg: >> >> ACPI: [Firmware Bug]: requested region covers kernel memory @ 0x000000092f033038 >> >> After digging into einj trigger interface, I think it's a kernel bug. > > I think you are right. This isn't the first bug where Linux tries > to validate addresses supplied by EINJ for Linux to read/write. > > I hadn't come across it because I almost always set: > > # echo 1 > notrigger > > so that I can have some application, or function in the kernel > trigger the error. Instead of running the EINJ trigger action > to make it happen right away. Haha, I know your great test suit, ras-tools. All cases are not triggered by EINJ tigger action. I have learned a lot from it. >> I am wondering that should we use kmap to map RAM in acpi_map or add a >> another path to address this issue? Any comment is welcomed. > > Perhaps just drop the sanity checks? Just trusting the BIOS? Sounds > radical, but this is validation code where the user is deliberately > injecting errors. If there are BIOS bugs, then people doing validation > may be well positioned to find the BIOS people to make them fix > things. > > Problem with this approach is that EINJ calls into the APEI code > that is used for other things besides error injection for validation. > So a blanket removal of sanity checks wouldn't be a good idea. Agree. A blanket removal of APEI sanity checks is not a good idea. How about requesting memory with kmap instead APEI API only in __einj_error_trigger()? Then we would not break the validation of APEI code and could trigger the injected error. I have provided a rough code in last mail. > A hacking way to address this issue is that map RAM memory with kmap > instead of apei_exec_pre_map_gars, and read it directly instead of > apei_exec_run. > - rc = apei_exec_pre_map_gars(&trigger_ctx); > - if (rc) > - goto out_release; > + volatile long *ptr; > + long tmp; > + unsigned long pfn; > + pfn = param1 >> PAGE_SHIFT; > > - rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR); > + ptr = kmap(pfn_to_page(pfn)); > + tmp = *(ptr + (param1 & ~ PAGE_MASK)); > > - apei_exec_post_unmap_gars(&trigger_ctx); Best Regards. Shuai