Hello Patrick, On Mon, Oct 12, 2015 at 11:04:11AM -0400, Patrick Donnelly wrote: > Hello Andrea, > > On Mon, Jun 15, 2015 at 1:22 PM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > This is an incremental update to the userfaultfd code in -mm. > > Sorry I'm late to this party. I'm curious how a ptrace monitor might > use a userfaultfd to handle faults in all of its tracees. Is this > possible without having each (newly forked) tracee "cooperate" by > creating a userfaultfd and passing that to the tracer? To make the non cooperative usage work, userfaulfd also needs more features to track fork() and mremap() syscalls and such, as the monitor needs to be aware about modifications to the address space of each "mm" is managing and of new forked "mm" as well. So fork() won't need to call userfaultfd once we add those features, but it still doesn't need to know about the "pid". The uffd_msg already has padding to add the features you need for that. Pavel invented and developed those features for the non cooperative usage to implement postcopy live migration of containers. He posted some patchset on the lists too, but it probably needs to be rebased on upstream. The ptrace monitor thread can also fault into the userfault area if it wants to (but only if it's not the userfault manager thread as well). I didn't expect the ptrace monitor to want to be a userfault manager too though. On a side note, the signals the ptrace monitor sends to the tracee (SIGCONT|STOP included) will only be executed by the tracee without waiting for userfault resolution from the userfault manager, if the tracees userfault wasn't triggered in kernel context (and in a non cooperative usage that's not an assumption you can make). If the tracee hits an userfault while running in kernel context, the userfault manager must resolve the userfault before any signal (except SIGKILL of course) can be executed by the tracee. Only SIGKILL is instantly executed by all tracees no matter if it was an userfault in kernel or user context. That may be another reason for not wanting the ptrace monitor and the userfault manager in the same thread (they can still be running in two different threads of the same external process). > Have you considered using one userfaultfd for an entire tree of > processes (signaled through a flag)? Would not a process id included > in the include/uapi/linux/userfaultfd.h:struct uffd_msg be sufficient > to disambiguate faults? I got a private email asking a corollary question about having the faulting IP address in the uffd_msg recently, which I answered and I take opportunity to quote it as well below, as it's somewhat connected with your "pid" question and this adds more context. === At times it's the kernel accessing the page (copy-user get user pages) like if the buffer is a parameter to the write or read syscalls, just to make an example. The IP address triggering the fault isn't necessarily a userland address. Furthermore not even the pid is known, so you don't know which process accessed it. userfaultfd only notifies userland that a certain page is requested and must be mapped ASAP. You don't know why or who touched it. === Now about adding the "pid": the association between "pid" and "mm" isn't so strict in the kernel. You can tell which "pid" shares the same "mm" but if you look from userland, you can't always tell which "mm"(/process) the pid belongs to. At times async io threads or vhost-net threads can impersonate the "mm" and in effect become part of the process and you'd get those random "pid" of kernel threads. It could also be a ptrace that triggers an userfault, with a "pid" that isn't part of the application and the manager must still work seamlessly no matter who or which "pid" triggered the userfault. So overall dealing the "pid"s sounds like not very clean as the same kernel thread "pid" can impersonate multiple "mm" and you wouldn't get the information of which "mm" the "address" belongs to. When userfaultfd() is called, it literally binds to the "mm" the process is running on and it's pid agnostic. Then when a kernel thread impersonating the "mm" faults into the "mm" with get_user_pages or copy_user or when a ptrace faults into the "mm", the userafult manager won't even see the difference. Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html