Re: [PATCH v3 4/8] mm: userfaultfd: add new UFFDIO_POISON ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 7, 2023 at 6:37 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Thu, Jul 06, 2023 at 03:50:32PM -0700, Axel Rasmussen wrote:
> > The basic idea here is to "simulate" memory poisoning for VMs. A VM
> > running on some host might encounter a memory error, after which some
> > page(s) are poisoned (i.e., future accesses SIGBUS). They expect that
> > once poisoned, pages can never become "un-poisoned". So, when we live
> > migrate the VM, we need to preserve the poisoned status of these pages.
> >
> > When live migrating, we try to get the guest running on its new host as
> > quickly as possible. So, we start it running before all memory has been
> > copied, and before we're certain which pages should be poisoned or not.
> >
> > So the basic way to use this new feature is:
> >
> > - On the new host, the guest's memory is registered with userfaultfd, in
> >   either MISSING or MINOR mode (doesn't really matter for this purpose).
> > - On any first access, we get a userfaultfd event. At this point we can
> >   communicate with the old host to find out if the page was poisoned.
> > - If so, we can respond with a UFFDIO_POISON - this places a swap marker
> >   so any future accesses will SIGBUS. Because the pte is now "present",
> >   future accesses won't generate more userfaultfd events, they'll just
> >   SIGBUS directly.
> >
> > UFFDIO_POISON does not handle unmapping previously-present PTEs. This
> > isn't needed, because during live migration we want to intercept
> > all accesses with userfaultfd (not just writes, so WP mode isn't useful
> > for this). So whether minor or missing mode is being used (or both), the
> > PTE won't be present in any case, so handling that case isn't needed.
> >
> > Similarly, UFFDIO_POISON won't replace existing PTE markers. This might
> > be okay to do, but it seems to be safer to just refuse to overwrite any
> > existing entry (like a UFFD_WP PTE marker).
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
>
> I agree the current behavior is not as clear, especially after hwpoison
> introduced.
>
> uffdio-copy is special right now that it can overwrite a marker, so a buggy
> userapp can also overwrite a poisoned entry, but it also means the userapp
> is broken already, so may not really matter much.
>
> While zeropage wasn't doing that. I think that was just overlooked - i
> assume it has the same reasoning as uffdio-copy otherwise.. and no one just
> used zeropage over a wp marker yet, or just got it work-arounded by
> unprotect+zeropage.
>
> Not yet sure whether it'll make sense to unify this a bit, but making the
> new poison api to be strict look fine.  If you have any thoughts after
> reading feel free to keep the discussion going, I can ack this one I think
> (besides my rename request in 1st patch):

Agreed, it would be nice to unify things. In my v2 I had anon/shmem
and hugetlbfs behaving differently in this respect, for the same
reason - it was just overlooked / cargo culted from existing code. If
nothing else I think a single ioctl should be consistent across memory
types! Heh.

But I also think you're right and it's not exactly intentional that
copy / zeropage / etc are different in this respect. Some unification
would be nice, although I'm not 100% sure what that looks like
concretely.

My rule of thumb is, in cases where we can't imagine a real use case,
it's better to be too strict rather than too loose. And in the future,
it's less disruptive to loosen restrictions rather than tighten them
(potentially breaking something which used to work).

I'll leave untangling this to some future series, though.

>
> Acked-by: Peter Xu <peterx@xxxxxxxxxx>
>
> --
> Peter Xu
>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux