On Mon, Sep 14, 2020 at 10:47 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > (Note that it's a commit and has a SHA1, but it's from my "throw-away > tree for testing", so it doesn't have my sign-off or any real commit > message yet: I'll do that once it gets actual testing and comments). Just to keep the list and people who were on this thread informed: Michal ended up doing more benchmarking, and everything seems to line up and yes, that patch continues to work fine with a 'unfairness' value of 5. So I've committed it to my git tree (not pushed out yet, I have other pull requests etc I'm handling too), and we'll see if anybody can come up with a better model for how to avoid the page locking being such a pain. Or if somebody can figure out why fair locking causes problems for that packetdrill load that Matthieu reported. It does strike me that if the main source of contention comes from that "we need to check that the mapping is still valid as we insert the page into the page tables", then the page lock really isn't the obvious lock to use. It would be much more natural to use the mapping->i_mmap_rwsem, I feel. Willy? Your "just check for uptodate without any lock" patch itself feels wrong. That's what we do for plain reads, but the difference is that a read is a one-time event and a race is fine: we get valid data, it's just that it's only valid *concurrently* with the truncate or hole-punching event (ie either all zeroes or old data is fine). The reason faulting a page in is different from a read is that if you then map in a stale page, it might have had the correct contents at the time of the fault, but it will not have the correct contents going forward. So a page-in requires fundamentally stronger locking than a read() does, because of how the page-in causes that "future lifetime" of the page, in ways a read() event does not. But truncation that does page cache removal already requires that i_mmap_rwsem, and in fact the VM already very much uses that (ie when walking the page mapping). The other alternative might be just the mapping->private_lock. It's not a reader-writer lock, but if we don't need to sleep (and I don't think the final "check ->mapping" can sleep anyway since it has to be done together with the page table lock), a spinlock would be fine. Linus