On Sat, Jul 8, 2023 at 12:06 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, 8 Jul 2023 at 11:40, Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > My understanding was that flush_cache_dup_mm() is there to ensure > > nothing is in the cache, so locking VMAs before doing that would > > ensure that no page faults would pollute the caches after we flushed > > them. Is that reasoning incorrect? > > It is indeed incorrect. > > The VIVT caches are fundamentally broken, and we have various random > hacks for them to make them work in legacy situations. > > And that flush_cache_dup_mm() is exactly that: a band-aid to make sure > that when we do a fork(), any previous writes that are dirty in the > caches will have made it to memory, so that they will show up in the > *new* process that has a different virtual mapping. > > BUT! > > This has nothing to do with page faults, or other threads. > > If you have a threaded application that does fork(), it can - and will > - dirty the VIVT caches *during* the fork, and so the whole > "flush_cache_dup_mm()" is completely and fundamentally race wrt any > *new* activity. > > It's not even what it is trying to deal with. All it tries to do is to > make sure that the newly forked child AT LEAST sees all the changes > that the parent did up to the point of the fork. Anything after that > is simply not relevant at all. > > So think of all this not as some kind of absolute synchronization and > cache coherency (because you will never get that on a VIVT > architecture anyway), but as a "for the simple cases, this will at > least get you the expected behavior". > > But as mentioned, for the issue of PER_VMA_LOCK, this is all *doubly* > irrelevant. Not only was it not relevant to begin with (ie that cache > flush only synchronizes parent -> child, not other-threads -> child), > but VIVT caches don't even exist on any relevant architecture because > they are fundamentally broken in so many other ways. > > So all our "synchronize caches by hand" is literally just band-aid for > legacy architectures. I think it's mostly things like the old broken > MIPS chips, some sparc32, pa-risc: the "old RISC" stuff, where people > simplified the hardware a bit too much. > > VIVT is lovely for hardware people becasue they get a shortcut. But > it's "lovely" in the same way that "PI=3" is lovely. It's simpler - > but it's _wrong_. > > And it's almost entirely useless if you ever do SMP. I guarantee we > have tons of races with it for very fundamental reasons - the problems > it causes for software are not fixable, they are "hidable for the > simple case". > > So you'll also find things like dcache_page_flush(), which flushes > writes to a page to memory. And exactly like the fork() case, it's > *not* real cache coherency, and it's *not* some kind of true global > serialization. > > It's used in cases where we have a particular user that wants the > changes *it* made to be made visible. And exactly like > flush_cache_dup_mm(), it cannot deal with concurrent changes that > other threads make. Thanks for the explanation! It's quite educational. > > > Ok, I think these two are non-controversial: > > https://lkml.kernel.org/r/20230707043211.3682710-1-surenb@xxxxxxxxxx > > https://lkml.kernel.org/r/20230707043211.3682710-2-surenb@xxxxxxxxxx > > These look sane to me. I wonder if the vma_start_write() should have > been somewhere else, but at least it makes sense in context, even if I > get the feeling that maybe it should have been done in some helper > earlier. > > As it is, we randomly do it in other helpers like vm_flags_set(), and > I've often had the reaction that these vma_start_write() calls are > randomly sprinked around without any clear _design_ for where they > are. We write-lock a VMA before any modification. I tried to provide explanations for each such locking in my comments/patch descriptions but I guess I haven't done a good job at that... > > > and the question now is how we fix the fork() case: > > https://lore.kernel.org/all/20230706011400.2949242-2-surenb@xxxxxxxxxx/ > > (if my above explanation makes sense to you) > > See above. That patch is nonsensical. Trying to order > flush_cache_dup_mm() is not about page faults, and is fundamentally > not doable with threads anyway. > > > https://lore.kernel.org/all/20230705063711.2670599-2-surenb@xxxxxxxxxx/ > > This is the one that makes sense to me. Ok, I sent you 3-patch series with the fixes here: https://lore.kernel.org/all/20230708191212.4147700-1-surenb@xxxxxxxxxx/ Do you want me to disable per-VMA locks by default as well? > > Linus