Re: [BUG] kernel side can NOT trigger memory error with einj

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2022/3/21 AM10:43, Huang, Ying 写道:
> Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> writes:
> 
>> 在 2022/3/18 AM12:57, Luck, Tony 写道:
>>>> -       rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
>>>> +       ptr = kmap(pfn_to_page(pfn));
>>>> +       tmp = *(ptr + (param1 & ~ PAGE_MASK));
>>>
>>> That hack works when the trigger action is just trying to access the injected
>>> location. But on Intel platforms the trigger "kicks" the patrol scrubber in the
>>> memory controller to access the address. So the error is triggered not by
>>> an access from the core, but by internal memory controller access.
>>>
>>> This results in a different error signature (for an uncorrected error injection
>>> it will be a UCNA or SRAO in Intel acronym-speak).
>>
>> As far as I know, APEI only defines five injection instructions, ACPI_EINJ_READ_REGISTER,
>> ACPI_EINJ_READ_REGISTER_VALUE, ACPI_EINJ_WRITE_REGISTER, ACPI_EINJ_WRITE_REGISTER_VALUE and
>> ACPI_EINJ_NOOP. ACPI_EINJ_TRIGGER_ERROR action should run one of them, I don't see
>> any of them will kick the patrol scrubber. For example, trigger with ACPI_EINJ_READ_REGISTER:
>>
>> apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR)
>>     __apei_exec_run	// ins=0
>>         => apei_exec_read_register
>>             => apei_read
>>                 => acpi_os_read_memory
>>                     => acpi_map_vaddr_lookup    /* lookup VA of PA from acpi_ioremap */
>>                     => acpi_os_ioremap
>> 		    => acpi_os_read_iomem
>> 			=> *(u32 *) value = readl(virt_addr);
>>
>> As we can see, the error is triggered by access from the core. However, the physical
>> address can NOT be mapped by acpi_os_ioremap.
>>
>> If I missed anything, please let me know. Thank you very much.


> If you write a device register, the device can kick the patrol scrubber
> for you.  This device behavior needs not to be defined in APEI spec.

I see, thank you. In our platform, patrol scrubber triggers deferred error, and the fatal
error is triggered by an access from CPU.

> As the name suggested, ACPI_EINJ_READ/WRITE_REGISTER are used to
> read/write device registers via iomem.  They aren't used to read/write
> normal physical memory.  If that's needed, you can try some other method
> I guess.

I think so, should we add new injection instructions to address this problem,
e.g. ACPI_EINJ_READ_MEMORY implemented by kmap?

By the way, commit fdea163d8c17 ("ACPI, APEI, EINJ, Fix resource conflict on some
machine") removes the injecting memory address range which conflits with
regular memory from trigger table resources. It make sense when calling
apei_resources_request(). **However, the actual mapping operation in
apei_exec_pre_map_gars() with trigger_ctx. And the conflit physical address
is still in trigger_ctx.**

		// drivers/acpi/apei/einj.c: __einj_error_trigger
		trigger_param_region = einj_get_trigger_parameter_region(
			trigger_tab, param1, param2);
		if (trigger_param_region) {
			...
		}

If the trigger_param_region is valid which means that the triggered address is
ACPI_ADR_SPACE_SYSTEM_MEMORY, then we should not use apei_exec_pre_map_gars to
map like a register, right? If we have ACPI_EINJ_READ_MEMORY, then we can directly
run ACPI_EINJ_TRIGGER_ERROR through ACPI_EINJ_READ_MEMORY.

Best Regards
Shuai









[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux