On Tue, Dec 13, 2022 at 10:10 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > > I think that one point not mentioned yet is how the in-kernel scanner finds > > a broken page before the page is marked by PG_hwpoison. Some mechanism > > similar to mcsafe-memcpy could be used, but maybe memcpy is not necessary > > because we just want to check the healthiness of pages. So a core routine > > like mcsafe-read would be introduced in the first patchset (or we already > > have it)? > > I don’t think that there is an existing routine to do the mcsafe-read. But it should > be easy enough to write one. If an architecture supports a way to do this without > evicting other data from caches, that would be a bonus. X86 has a non-temporal > read that could be interesting ... but I'm not sure that it would detect poison > synchronously. I could be wrong, but I expect that you won’t see a machine check, > but you should see the memory controller log a UCNA error reported by a CMCI. > > -Tony To Naoya: yes, we will introduce a new scanning routine. It "touches" cacheline by cacheline of a page to detect memory error. This "touch" is essentially an ANDQ operation of loaded cacheline with 0, to avoid leaking user data in the register. To Tony: thanks. I think you are referring to PREFETCHNTA before ANDQ? (which we are using in our scanning routine to minimize cache pollution.) We tested the attached scanning draft on Intel Skylake + Cascadelake + Icelake CPUs, and the ANDQ instruction does raise a MC synchronously when an injected memory error is encountered. To Yazen and Vilas: We haven't tested on any AMD hardware. Do you have any thoughts on PREFETCHNTA + MC? /** * Detecting memory errors within a range of memory. * * Input: * rdi: starting address of the range. * rsi: exclusive ending address of the range. * * Output: * eax: X86_TRAP_MC if encounter poisoned memory, * X86_TRAP_PF if direct kernel mapping is not established, * 0 if success (assume this routine never hits X86_TRAP_DE). */ ENTRY(kmcescand_safe_read) /* Zero %rax. */ xor %rax, %rax 1: /* Prevent LLC pollution with non-temporal prefetch hint. */ prefetchnta (%rdi) 2: /** * This andq with constant rax=0 prevents leaking memory * content (especially userspace memory content like credentials) * into register. */ andq (%rdi), %rax /** * X86-64 CPUs read memory cacheline by cacheline (64 bytes), * so no need to explicitly do andq 64 bits by 64 bit; * instead increase directly to the next 64 byte memory address. */ add $64, %rdi cmp %rdi, %rsi jne 1b 3: ret /** * The exception handler ex_handler_fault fills eax with * the exception vector (e.g. #MC or #PF). */ _ASM_EXTABLE_FAULT(2b, 3b) ENDPROC(kmcescand_safe_read)