On Sat, 8 Jul 2023 at 15:36, Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Sat, Jul 8, 2023 at 2:18 PM Linus Torvalds > > > > Again - maybe I messed up, but it really feels like the missing > > vma_start_write() was more fundamental, and not some "TLB coherency" > > issue. > > Sounds plausible. I'll try to use the reproducer to verify if that's > indeed happening here. I really don't think that's what people are reporting, I was just trying to make up a completely different case that has nothing to do with any TLB issues. My real point was simply this one: > It's likely there are multiple problematic > scenarios due to this missing lock though. Right. That's my issue. I felt your explanation was *too* targeted at some TLB non-coherency thing, when I think the problem was actually a much larger "page faults simply must not happen while we're copying the page tables because data isn't coherent". The anon_vma case was just meant as another random example of the other kinds of things I suspect can go wrong, because we're simply not able to do this whole "copy vma while it's being modified by page faults". Now, I agree that the PTE problem is real, and probable the main thing, ie when we as part of fork() do this: /* * If it's a COW mapping, write protect it both * in the parent and the child */ if (is_cow_mapping(vm_flags) && pte_write(pte)) { ptep_set_wrprotect(src_mm, addr, src_pte); pte = pte_wrprotect(pte); } and the thing that can go wrong before the TLB flush happens is that - because the TLB's haven't been flushed yet - some threads in the parent happily continue to write to the page and didn't see the wrprotect happening. And then you get into the situation where *some* thread see the page protections change (maybe they had a TLB flush event on that CPU for random reasons), and they will take a page fault and do the COW thing and create a new page. And all the while *other* threads still see the old writeable TLB state, and continue to write to the old page. So now you have a page that gets its data copied *while* somebody is still writing to it, and the end result is that some write easily gets lost, and so when that new copy is installed, you see it as data corruption. And I agree completely that that is probably the thing that most people actually saw and reacted to as corruption. But the reason I didn't like the explanation was that I think this is just one random example of the more fundamental issue of "we simply must not take page faults while copying". Your explanation made me think "stale TLB is the problem", and *that* was what I objected to. The stale TLB was just one random sign of the much larger problem. It might even have been the most common symptom, but I think it was just a *symptom*, not the *cause* of the problem. And I must have been bad at explaining that, because David Hildenbrand also reacted negatively to my change. So I'll happily take a patch that adds more commentary about this, and gives several examples of the things that go wrong. Linus