On Mon, Apr 24, 2023 at 12:44 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > > > > On Apr 24, 2023, at 10:54 AM, Anish Moorthy <amoorthy@xxxxxxxxxx> wrote: > > > > On Fri, Apr 21, 2023 at 10:40 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > >> > >> If I understand the problem correctly, it sounds as if the proper solution > >> should be some kind of a range-locks. If it is too heavy or the interface can > >> be changed/extended to wake a single address (instead of a range), > >> simpler hashed-locks can be used. > > > > Some sort of range-based locking system does seem relevant, although I > > don't see how that would necessarily speed up the delivery of faults > > to UFFD readers: I'll have to think about it more. > > Perhaps I misread your issue. Based on the scalability issues you raised, > I assumed that the problem you encountered is related to lock contention. > I do not know whether your profiled it, but some information would be > useful. No, you had it right: the issue at hand is contention on the uffd wait queues. I'm just not sure what the range-based locking would really be doing. Events would still have to be delivered to userspace in an ordered manner, so it seems to me that each uffd would still need to maintain a queue (and the associated contention). With respect to the "sharding" idea, I collected some more runs of the self test (full command in [1]). This time I omitted the "-a" flag, so that every vCPU accesses a different range of guest memory with its own UFFD, and set the number of reader threads per UFFD to 1. vCPUs, Average Paging Rate (w/o new caps), Average Paging Rate (w/ new caps) 1 180 307 2 85 220 4 80 206 8 39 163 16 18 104 32 8 73 64 4 57 128 1 37 256 1 16 I'm reporting paging rate on a per-vcpu rather than total basis, which is why the numbers look so different than the ones in the cover letter. I'm actually not sure why the demand paging rate falls off with the number of vCPUs (maybe a prioritization issue on my side?), but even when UFFDs aren't being contended for it's clear that demand paging via memory fault exits is significantly faster. I'll try to get some perf traces as well: that will take a little bit of time though, as to do it for cycler will involve patching our VMM first. [1] ./demand_paging_test -b 64M -u MINOR -s shmem -v <n> -r 1 [-w] > It certainly not my call. But if you ask me, introducing a solution for > a concrete use-case that requires API changes/enhancements is not > guaranteed to be the best solution. It may be better first to fully > understand the existing overheads and agree that there is no alternative > cleaner and more general solution with similar performance. > > Considering the mess that KVM async-PF introduced, I > would be very careful before introducing such API changes. I did not look > too much on the details, but some things anyhow look slightly strange > (which might be since I am out-of-touch with KVM). For instance, returning > -EFAULT on from KVM_RUN? I would have assumed -EAGAIN would be more > appropriate since the invocation did succeed. I'm not quite sure whether you're focusing on KVM_CAP_MEMORY_FAULT_INFO or KVM_CAP_ABSENT_MAPPING_FAULT here. But to my knowledge, none of the KVM folks have objections to either: hopefully it stays that way, but we'll have to see :)