On Oct 27, 2022, at 11:13 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > Anybody willing to try to write up the rules (and have each rule > document *why* it's a rule - not just "by fiat", but an actual "these > are the rules and this is *why* they are the rules"). > > Because right now I think all of our rules are almost entirely just > encoded in the code, with a couple of comments, and a few people who > just remember why we do what we do. I think it might be easier to come up with new rules instead of phrasing the existing ones. The approach I suggested before [1] is something like: 1. Turn x86’s TLB-generation mechanism to be generic. Turn the TLB-generation into “pending TLB-generation”. 2. For each mm track “completed TLB-generation”, whenever an actual flush takes place. 3. When you defer a TLB-flush, while holding the PTL: a. Increase the TLB-generation. b. Save the updated “table generation" in a new field in the page-table’s page-struct. 4. When you are about to rely on a PTE value that is read from a page-table, first check if a TLB flush is needed. The check is performed by comparing the “table generation” with the “completed generation”. If the “table generation” is behind, a TLB flush is needed. [ You rely on the PTE value when you install new PTEs or change them ] That’s about it. I might have not covered some issues with fast-GUP. But in general I think it is a simple scheme. The thing I like about this scheme the most is that it avoids relying on almost all the OS data-structures (e.g., PageAnon()), making it much easier to grasp. I can revive the patch-set if the overall approach is agreeable. [1] https://lore.kernel.org/lkml/20210131001132.3368247-1-namit@xxxxxxxxxx/