> On May 13, 2019, at 4:27 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Mon, May 13, 2019 at 09:21:01AM +0000, Nadav Amit wrote: >>> On May 13, 2019, at 2:12 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > >>>> The other thing I was thinking of is trying to detect overlap through >>>> the page-tables themselves, but we have a distinct lack of storage >>>> there. >>> >>> We might just use some state in the pmd, there's still 2 _pt_pad_[12] in >>> struct page to 'use'. So we could come up with some tlb generation >>> scheme that would detect conflict. >> >> It is rather easy to come up with a scheme (and I did similar things) if you >> flush the table while you hold the page-tables lock. But if you batch across >> page-tables it becomes harder. > > Yeah; finding that out now. I keep finding holes :/ You can use Uhlig’s dissertation for inspiration (Section 4.4). [1] https://www.researchgate.net/publication/36450482_Scalability_of_microkernel-based_systems/download > >> Thinking about it while typing, perhaps it is simpler than I think - if you >> need to flush range that runs across more than a single table, you are very >> likely to flush a range of more than 33 entries, so anyhow you are likely to >> do a full TLB flush. > > We can't rely on the 33, that x86 specific. Other architectures can have > another (or no) limit on that. I wonder whether there are architectures that do no invalidate the TLB entry by entry, experiencing these kind of overheads. >> So perhaps just avoiding the batching if only entries from a single table >> are flushed would be enough. > > That's near to what Will suggested initially, just flush the entire > thing when there's a conflict. One question is how you define a conflict. IIUC, Will suggests same mm marks a conflict. In addition, I suggest that if you only remove a single entry (or few ones), you would just not batch and do the flushing while holding the page-table lock.