On Sun, May 7, 2023 at 6:23 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > I explained why I think it could be useful to test this in my reply to > Nadav, do you think it makes sense to you? Ah, I actually missed your reply to Nadav: didn't realize you had sent *two* emails. > While OTOH if multi-uffd can scale well, then there's a chance of > general solution as long as we can remove the single-queue > contention over the whole guest mem. I don't quite understand your statement here: if we pursue multi-uffd, then it seems to me that by definition we've removed the single queue(s) for all of guest memory, and thus the associated contention. And we'd still have the issue of multiple vCPUs contending for a single UFFD. But I do share some of your curiosity about multi-uffd performance, especially since some of my earlier numbers indicated that multi-uffd doesn't scale linearly, even when each vCPU corresponds to a single UFFD. So, I grabbed some more profiles for 32 and 64 vcpus using the following command ./demand_paging_test -b 512M -u MINOR -s shmem -v <n> -r 1 -c <1,...,n> The 32-vcpu config achieves a per-vcpu paging rate of 8.8k. That rate goes down to 3.9k (!) with 64 vCPUs. I don't immediately see the issue from the traces, but safe to say it's definitely not scaling. Since I applied your fixes from earlier, the prefaulting isn't being counted against the demand paging rate either. 32-vcpu profile: https://drive.google.com/file/d/19ZZDxZArhSsbW_5u5VcmLT48osHlO9TG/view?usp=drivesdk 64-vcpu profile: https://drive.google.com/file/d/1dyLOLVHRNdkUoFFr7gxqtoSZGn1_GqmS/view?usp=drivesdk Do let me know if you need svg files instead and I'll try and figure that out.