Re: [PATCH 3/3] mm/uffd: Detect pgtable allocation failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05.01.23 19:01, Nadav Amit wrote:


On Jan 5, 2023, at 12:59 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:

On 05.01.23 04:10, Nadav Amit wrote:
On Jan 4, 2023, at 2:52 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:

Before this patch, when there's any pgtable allocation issues happened
during change_protection(), the error will be ignored from the syscall.
For shmem, there will be an error dumped into the host dmesg.  Two issues
with that:

  (1) Doing a trace dump when allocation fails is not anything close to
      grace..

  (2) The user should be notified with any kind of such error, so the user
      can trap it and decide what to do next, either by retrying, or stop
      the process properly, or anything else.

For userfault users, this will change the API of UFFDIO_WRITEPROTECT when
pgtable allocation failure happened.  It should not normally break anyone,
though.  If it breaks, then in good ways.

One man-page update will be on the way to introduce the new -ENOMEM for
UFFDIO_WRITEPROTECT.  Not marking stable so we keep the old behavior on the
5.19-till-now kernels.
I understand that the current assumption is that change_protection() should
fully succeed or fail, and I guess this is the current behavior.
However, to be more “future-proof” perhaps this needs to be revisited.
For instance, UFFDIO_WRITEPROTECT can benefit from the ability to (based on
userspace request) prevent write-protection of pages that are pinned. This is
necessary to allow userspace uffd monitor to avoid write-protection of
O_DIRECT’d memory, for instance, that might change even if a uffd monitor
considers it write-protected.

Just a note that this is pretty tricky IMHO, because:

a) We cannot distinguished "pinned readable" from "pinned writable"
b) We can have false positives ("pinned") even for compound pages due to
   concurrent GUP-fast.
c) Synchronizing against GUP-fast is pretty tricky ... as we learned.
   Concurrent pinning is usually problematic.
d) O_DIRECT still uses FOLL_GET and we cannot identify that. (at least
   that should be figured out at one point)

My prototype used the page-count IIRC, so it had false-positives (but

I suspect GUP-fast is still problematic, I might be wrong.

addressed O_DIRECT). And yes, precise refinement is complicated. However,
if you need to uffd-wp memory, then without such a mechanism you need to
ensure no kerenl/DMA write to these pages is possible. The only other
option I can think of is interposing/seccomp on a variety of syscalls,
to prevent uffd-wp of such memory.

The whole thing reminds me of MADV_DONTNEED+pinning: an application shouldn't do it, because you can only get it wrong :) I know, that's a bad answer.

--
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux