Re: [RFC] Expose a memory poison detector ioctl to user space.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

Thanks for the reply, some comments inline.

On Tue, Apr 26, 2022 at 8:40 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> From your description, you have me mostly convinced that this is
> something that needs to get fixed.  The hardware patrol scrubber(s)
> address the same basic problem, but don't seem to be flexible to your
> specific needs.
>
> But, have hardware vendors been receptive at all to making the patrol
> scrubbers more tunable?

We have discussed the use case in detail with Intel. There are
improvements in progress to address some of the issues like the
signaling to avoid broadcasted MCEs. But fundamentally, the needed
throughput is not quite compatible with the patrol scrubber's design
purpose and arch.

It's unclear at what generation of hardware this need may get
addressed. Thus now, we look at software assisted approaches making
use of the _whole_ CPU.
>
> On 4/25/22 09:34, Jue Wang wrote:
> > /* Could stop and return after the 1st poison is detected */
> > #define MCESCAN_IOCTL_SCAN 0
> >
> > struct SysramRegion {
> >   /* input */
> >   uint64_t first_byte;   /* first page-aligned physical address to scan */
> >   uint64_t length;       /* page-aligned length of memory region to scan */
> >   /* output */
> >   uint32_t poisoned;     /* 1 - a poisoned page is found, 0 - otherwise */
> >   uint32_t poisoned_pfn; /* PFN of the 1st detected poisoned page */
> > }
>
> So, the ioctl() caller has to know the physical address layout of the
> system?

This info is available from /proc/iomem and /proc/zoneinfo already
supported / exposed by the kernel.

>
> While this is a good start at a conversation, I think you might want to
> back up a bit.  You alluded to a few requirements that you have, like:
>
>  * Adjustable detector resource use based on system utilization
>  * Adjustable scan rate to ensure issues are found at a deterministic
>    rate
>  * Detector must be able to find errors in allocated, in-use memory
>
> What about SEV-SNP or TDX private memory?  It might be unmapped *and*
> limited in how it can be accessed.  For instance, TDX hosts can't
> practically read guest memory.  SEV-SNP hosts have special page mapping
> requirements; the cost can't create arbitrary mappings with arbitrary
> mapping sizes.  What would this ioctl() do if asked to scan a TDX guest
> private page?
>

Thanks for raising the UPM case for SEV-SNP / TDX private memory. This
is what we like to get more feedback and more experts' weigh-ins.

Is reading private memory via kernel's direct mapping benign for
SEV-SNP and TDX? If true, could this be a way to let SEV-SNP and TDX
use cases benefit from this work while the user space / hypervisor
mapping is still removed?

Otherwise this feature should be defined as mutually exclusive with
incompatible features. Even in that case, I believe SEV-SNP or TDX may
still benefit from _reactive_ memory poison recovery if the MCE
handling and CONFIG_MEMORY_FAILURE still function the same on
uncorrectable error raised #MC.


> Is doing it from userspace a strict requirement?
>
> Would the detector just read memory?
>
> Are there any other physical addresses which are RAM but should not have
> the detector used on them?
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux