On Fri, May 31, 2024 at 1:24 AM Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > On Wed, May 29, 2024 at 03:03:21PM -0600, Yu Zhao wrote: > > On Wed, May 29, 2024 at 12:05 PM James Houghton <jthoughton@xxxxxxxxxx> wrote: > > > > > > Secondary MMUs are currently consulted for access/age information at > > > eviction time, but before then, we don't get accurate age information. > > > That is, pages that are mostly accessed through a secondary MMU (like > > > guest memory, used by KVM) will always just proceed down to the oldest > > > generation, and then at eviction time, if KVM reports the page to be > > > young, the page will be activated/promoted back to the youngest > > > generation. > > > > Correct, and as I explained offline, this is the only reasonable > > behavior if we can't locklessly walk secondary MMUs. > > > > Just for the record, the (crude) analogy I used was: > > Imagine a large room with many bills ($1, $5, $10, ...) on the floor, > > but you are only allowed to pick up 10 of them (and put them in your > > pocket). A smart move would be to survey the room *first and then* > > pick up the largest ones. But if you are carrying a 500 lbs backpack, > > you would just want to pick up whichever that's in front of you rather > > than walk the entire room. > > > > MGLRU should only scan (or lookaround) secondary MMUs if it can be > > done lockless. Otherwise, it should just fall back to the existing > > approach, which existed in previous versions but is removed in this > > version. > > Grabbing the MMU lock for write to scan sucks, no argument there. But > can you please be specific about the impact of read lock v. RCU in the > case of arm64? I had asked about this before and you never replied. > > My concern remains that adding support for software table walkers > outside of the MMU lock entirely requires more work than just deferring > the deallocation to an RCU callback. Walkers that previously assumed > 'exclusive' access while holding the MMU lock for write must now cope > with volatile PTEs. > > Yes, this problem already exists when hardware sets the AF, but the > lock-free walker implementation needs to be generic so it can be applied > for other PTE bits. Direct reclaim is multi-threaded and each reclaimer can take the mmu lock for read (testing the A-bit) or write (unmapping before paging out) on arm64. The fundamental problem of using the readers-writer lock in this case is priority inversion: the readers have lower priority than the writers, so ideally, we don't want the readers to block the writers at all. Using my previous (crude) analogy: puting the bill right in front of you (the writers) profits immediately whereas searching for the largest bill (the readers) can be futile. As I said earlier, I prefer we drop the arm64 support for now, but I will not object to taking the mmu lock for read when clearing the A-bit, as long as we fully understand the problem here and document it clearly.