> On Apr 24, 2023, at 10:54 AM, Anish Moorthy <amoorthy@xxxxxxxxxx> wrote: > > On Fri, Apr 21, 2023 at 10:40 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: >> >> If I understand the problem correctly, it sounds as if the proper solution >> should be some kind of a range-locks. If it is too heavy or the interface can >> be changed/extended to wake a single address (instead of a range), >> simpler hashed-locks can be used. > > Some sort of range-based locking system does seem relevant, although I > don't see how that would necessarily speed up the delivery of faults > to UFFD readers: I'll have to think about it more. Perhaps I misread your issue. Based on the scalability issues you raised, I assumed that the problem you encountered is related to lock contention. I do not know whether your profiled it, but some information would be useful. Anyhow, if the issue you encounter is mostly about the general overhead of delivering userfaultfd, I encountered this one too. The solution I tried (and you can find some old patches) is in delivering and resolving userfaultfd using IO-uring. The main advantage is that this solution is generic and performance is pretty good. The disadvantage is that you do need to allocate a polling thread/core to handle the userfaultfd. If you want, I can send you privately the last iteration of my patches for you to give it a spin. > > On the KVM side though, I think there's value in merging > KVM_CAP_ABSENT_MAPPING_FAULT and allowing performance improvements to > UFFD itself proceed separately. It's likely that returning faults > directly via the vCPU threads will be faster than even an improved > UFFD, since the former approach basically removes one level of > indirection. That seems important, given the criticality of the > EPT-violation path during postcopy. Furthermore, if future performance > improvements to UFFD involve changes/restrictions to its API, then > KVM_CAP_ABSENT_MAPPING_FAULT could well be useful anyways. > > Sean did mention that he wanted KVM_CAP_MEMORY_FAULT_INFO in general, > so I'm guessing (some version of) that will (eventually :) be merged > in any case. It certainly not my call. But if you ask me, introducing a solution for a concrete use-case that requires API changes/enhancements is not guaranteed to be the best solution. It may be better first to fully understand the existing overheads and agree that there is no alternative cleaner and more general solution with similar performance. Considering the mess that KVM async-PF introduced, I would be very careful before introducing such API changes. I did not look too much on the details, but some things anyhow look slightly strange (which might be since I am out-of-touch with KVM). For instance, returning -EFAULT on from KVM_RUN? I would have assumed -EAGAIN would be more appropriate since the invocation did succeed.