Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.

David Matlack <dmatlack@xxxxxxxxxx> · Thu, 11 May 2023 10:17:56 -0700

On Wed, May 10, 2023 at 2:50 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> On Tue, May 09, 2023 at 01:52:05PM -0700, Anish Moorthy wrote:
> > On Sun, May 7, 2023 at 6:23 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> What I wanted to do is to understand whether there's still chance to
> provide a generic solution.  I don't know why you have had a bunch of pmu
> stack showing in the graph, perhaps you forgot to disable some of the perf
> events when doing the test?  Let me know if you figure out why it happened
> like that (so far I didn't see), but I feel guilty to keep overloading you
> with such questions.
>
> The major problem I had with this series is it's definitely not a clean
> approach.  Say, even if you'll all rely on userapp you'll still need to
> rely on userfaultfd for kernel traps on corner cases or it just won't work.
> IIUC that's also the concern from Nadav.

This is a long thread, so apologies if the following has already been discussed.

Would per-tid userfaultfd support be a generic solution? i.e. Allow
userspace to create a userfaultfd that is tied to a specific task. Any
userfaults encountered by that task use that fd, rather than the
process-wide fd. I'm making the assumption here that each of these fds
would have independent signaling mechanisms/queues and so this would
solve the scaling problem.

A VMM could use this to create 1 userfaultfd per vCPU and 1 thread per
vCPU for handling userfault requests. This seems like it'd have
roughly the same scalability characteristics as the KVM -EFAULT
approach.