Re: [BUG] kernel side can NOT trigger memory error with einj

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Tony,

Thank you for your quick reply.

在 2022/3/17 AM1:29, Luck, Tony 写道:
> On Tue, Mar 08, 2022 at 01:19:12PM +0800, Shuai Xue wrote:
>> Hi folks,
>>
>> If we inject an memory error at physical memory address, e.g. 0x92f033038,
>> used by a user space process:
>>
>> 	echo 0x92f033038 > /sys/kernel/debug/apei/einj/param1
>> 	echo 0xfffffffffffff000 > /sys/kernel/debug/apei/einj/param2
>> 	echo 0x1 > /sys/kernel/debug/apei/einj/flags
>> 	echo 0x8 > /sys/kernel/debug/apei/einj/error_type
>> 	echo 1 > /sys/kernel/debug/apei/einj/error_inject
>>
>> Then the following error will be reported in dmesg:
>>
>>     ACPI: [Firmware Bug]: requested region covers kernel memory @ 0x000000092f033038
>>
>> After digging into einj trigger interface, I think it's a kernel bug.
> 
> I think you are right. This isn't the first bug where Linux tries
> to validate addresses supplied by EINJ for Linux to read/write.
> 
> I hadn't come across it because I almost always set:
> 
> # echo 1 > notrigger
> 
> so that I can have some application, or function in the kernel
> trigger the error. Instead of running the EINJ trigger action
> to make it happen right away.

Haha, I know your great test suit, ras-tools. All cases are not triggered
by EINJ tigger action. I have learned a lot from it.

>> I am wondering that should we use kmap to map RAM in acpi_map or add a
>> another path to address this issue? Any comment is welcomed.
> 
> Perhaps just drop the sanity checks? Just trusting the BIOS? Sounds
> radical, but this is validation code where the user is deliberately
> injecting errors. If there are BIOS bugs, then people doing validation
> may be well positioned to find the BIOS people to make them fix
> things.
> 
> Problem with this approach is that EINJ calls into the APEI code
> that is used for other things besides error injection for validation.
> So a blanket removal of sanity checks wouldn't be a good idea.

Agree. A blanket removal of APEI sanity checks is not a good idea. How about
requesting memory with kmap instead APEI API only in __einj_error_trigger()?
Then we would not break the validation of APEI code and could trigger the
injected error.

I have provided a rough code in last mail.

> A hacking way to address this issue is that map RAM memory with kmap
> instead of apei_exec_pre_map_gars, and read it directly instead of
> apei_exec_run.
> -       rc = apei_exec_pre_map_gars(&trigger_ctx);
> -       if (rc)
> -               goto out_release;
> +       volatile long *ptr;
> +       long tmp;
> +       unsigned long pfn;
> +       pfn = param1 >> PAGE_SHIFT;
>
> -       rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
> +       ptr = kmap(pfn_to_page(pfn));
> +       tmp = *(ptr + (param1 & ~ PAGE_MASK));
>
> -       apei_exec_post_unmap_gars(&trigger_ctx);


Best Regards.
Shuai



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux