Re: [RFC] Kernel Support of Memory Error Detection.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 13, 2022 at 10:10 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> > I think that one point not mentioned yet is how the in-kernel scanner finds
> > a broken page before the page is marked by PG_hwpoison.  Some mechanism
> > similar to mcsafe-memcpy could be used, but maybe memcpy is not necessary
> > because we just want to check the healthiness of pages.  So a core routine
> > like mcsafe-read would be introduced in the first patchset (or we already
> > have it)?
>
> I don’t think that there is an existing routine to do the mcsafe-read. But it should
> be easy enough to write one.  If an architecture supports a way to do this without
> evicting other data from caches, that would be a bonus. X86 has a non-temporal
> read that could be interesting ... but I'm not sure that it would detect poison
> synchronously. I could be wrong, but I expect that you won’t see a machine check,
> but you should see the memory controller log a UCNA error reported by a CMCI.
>
> -Tony

To Naoya: yes, we will introduce a new scanning routine. It "touches"
cacheline by cacheline of a page to detect memory error. This "touch"
is essentially an ANDQ operation of loaded cacheline with 0, to avoid
leaking user data in the register.

To Tony: thanks. I think you are referring to PREFETCHNTA before ANDQ?
(which we are using in our scanning routine to minimize cache
pollution.) We tested the attached scanning draft on Intel Skylake +
Cascadelake + Icelake CPUs, and the ANDQ instruction does raise a MC
synchronously when an injected memory error is encountered.

To Yazen and Vilas: We haven't tested on any AMD hardware. Do you have
any thoughts on PREFETCHNTA + MC?

/**
 * Detecting memory errors within a range of memory.
 *
 * Input:
 * rdi: starting address of the range.
 * rsi: exclusive ending address of the range.
 *
 * Output:
 * eax: X86_TRAP_MC if encounter poisoned memory,
 *         X86_TRAP_PF if direct kernel mapping is not established,
 *         0 if success (assume this routine never hits X86_TRAP_DE).
 */
ENTRY(kmcescand_safe_read)
  /* Zero %rax. */
  xor %rax, %rax
1:
  /* Prevent LLC pollution with non-temporal prefetch hint. */
  prefetchnta (%rdi)
2:
  /**
   * This andq with constant rax=0 prevents leaking memory
   * content (especially userspace memory content like credentials)
   * into register.
   */
  andq (%rdi), %rax
  /**
   * X86-64 CPUs read memory cacheline by cacheline (64 bytes),
   * so no need to explicitly do andq 64 bits by 64 bit;
   * instead increase directly to the next 64 byte memory address.
   */
  add $64, %rdi
  cmp %rdi, %rsi
  jne 1b
3:
  ret
  /**
   * The exception handler ex_handler_fault fills eax with
   * the exception vector (e.g. #MC or #PF).
   */
  _ASM_EXTABLE_FAULT(2b, 3b)
ENDPROC(kmcescand_safe_read)





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux