On Sun, Jan 9, 2022 at 11:53 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > My original PCID series actually did remove lazy TLB on x86. I don't > remember why, but people objected. The issue isn't the limited PCID > space -- IIRC it's just that MOV CR3 is slooooow. If we get rid of > lazy TLB on x86, then we are writing CR3 twice on even a very short > idle. That adds maybe 1k cycles, which isn't great. Yeah, my gut feel is that lazy-TLB almost certainly makes sense on x86. And the grab/mmput overhead and associated cacheline ping-pong is (I think) something we could just get rid of on x86 due to the IPI model. There are probably other costs to lazy TLB, and I can imagine that there are other maintenance costs, but yes, cr3 moves have always been expensive on x86 even aside from the actual TLB switch. But I could easily imagine the situation being different on arm64, for example. But numbers beat "gut feel" and "easily imagine" every time. So it would be kind of nice to have that ... Linus