On Thu, Aug 30, 2018 at 11:55 AM, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote: > On 08/30/2018 10:34 AM, Andy Lutomirski wrote: >>> But, to keep B's TLB from picking up the entry, I think we can just make >>> it !Present for a moment. No TLB can cache it, and I believe the same >>> "don't set Dirty on a !Writable entry" logic also holds for !Present >>> (modulo a weird erratum or two). >> Can we get documentation? Pretty please? > > The accessed bit description in the SDM looks pretty good to me today: > >> Whenever the processor uses a paging-structure entry as part of >> linear-address translation, it sets the accessed flag in that entry >> (if it is not already set). > If it's !Present, it can't used as part of a translation so can't be > set. I think that covers the thing I was unsure about. > > But, Dirty is a bit, er, muddier, but mostly because it only gets set on > leaf entries: > >> Whenever there is a write to a linear address, the processor sets the >> dirty flag (if it is not already set) in the paging- structure entry >> that identifies the final physical address for the linear address >> (either a PTE or a paging-structure entry in which the PS flag is >> 1). > > That little hunk will definitely need to get updated with something like: > > On processors enumerating support for CET, the processor will on > set the dirty flag on paging structure entries in which the W > flag is 1. Can we get something much stronger, perhaps? Like this: On processors enumerating support for CET, the processor will write to the accessed and/or dirty flags atomically, as if using the LOCK CMPXCHG instruction. The memory access, any cached entries in any paging-structure caches, and the values in the paging-structure entry before and after writing the A and/or D bits will all be consistent. I'm sure this could be worded better. The point is that the CPU should, atomically, load the PTE, check if it allows the access, set A and/or D appropriately, write the new value to the TLB, and use that value for the access. This is clearly a little bit slower than what old CPUs could do when writing to an already-in-TLB writable non-dirty entry, but new CPUs are going to have to atomically check the W bit. (I assume that even old CPUs will *atomically* set the D bit as if by LOCK BTS, but this is all very vague in the SDM IIRC.)