> On Apr 20, 2023, at 2:29 PM, Peter Xu <peterx@xxxxxxxxxx> wrote: > > Hi, Anish, > > [Copied Nadav Amit for the last few paragraphs on userfaultfd, because > Nadav worked on a few userfaultfd performance problems; so maybe he'll > also have some ideas around] Sorry for not following this thread and previous ones. I skimmed through and I hope my answers would be helpful and relevant… Anyhow, I also encountered to some extent the cost of locking (not the contention). In my case, I only did a small change to reduce the overhead of acquiring the locks by “combining" the locks of faulting_pending_wqh and fault_wqh, which are (almost?) always acquired together. I only acquired fault_pending_wqh and then introduced “fake_spin_lock()” which only got lockdep to understand the fault_wqh is already protected. But as I said, this solution was only intended to reduce the cost of locking, and it does not solve the contention. If I understand the problem correctly, it sounds as if the proper solution should be some kind of a range-locks. If it is too heavy or the interface can be changed/extended to wake a single address (instead of a range), simpler hashed-locks can be used.