On Tue, Dec 22, 2020 at 04:01:45PM -0800, Linus Torvalds wrote: > On Tue, Dec 22, 2020 at 3:50 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > See zap_pte_range() for an example of doing it right, even in the > > presence of complexities (ie that has an example of both flushing the > > TLB, and doing the actual "free the pages after flush", and it does > > the two cases separately). > > The more I look at the mprotect code, the less I like it. We seem to > be much better about the TLB flushes in other places (looking at > mremap, for example). The mprotect code seems to be very laissez-faire > about the TLB flushing. > > Does adding a TLB flush to before that > > pte_unmap_unlock(pte - 1, ptl); > > fix things for you? It definitely does. But if I had to choose, I'd go with holding mmap_lock for write because 1) it's less likely to storm other CPUs by IPI and would only have performance impact on processes that use ufd, which I guess already have high tolerance for not-so-good performance, and 2) people are spearheading multiple efforts to reduce the mmap_lock contention, which hopefully would make ufd users suffer less soon. > That's not the right fix - leaving a stale TLB entry around is fine if > the TLB entry is more strict wrt protections - but it might be worth > testing as a "does it at least close the problem" patch. Well, things get trick if we do this. I'm not sure if I could vouch such a fix for stable as confident as I do holding mmap_lock for write.