On Tue, Jul 4, 2023 at 12:11 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Tue, Jul 4, 2023 at 11:05 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > On 04.07.23 19:56, Suren Baghdasaryan wrote: > > > On Tue, Jul 4, 2023 at 10:36 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > >> > > >> On 04.07.23 19:21, Suren Baghdasaryan wrote: > > >>> On Tue, Jul 4, 2023 at 6:07 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > >>>> > > >>>> On Tue, Jul 04, 2023 at 09:18:18AM +0200, David Hildenbrand wrote: > > >>>>>> At least the reproducer at > > >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217624 is working now. But > > >>>>>> I wonder if that's the best way to fix this. It's surely simple but > > >>>>>> locking every VMA is not free and doing that on every fork might > > >>>>>> regress performance. > > >>>>> > > >>>>> > > >>>>> That would mean that we can possibly still get page faults concurrent to > > >>>>> fork(), on the yet unprocessed part. While that fixes the issue at hand, I > > >>>>> cannot reliably tell if this doesn't mess with some other fork() corner > > >>>>> case. > > >>>>> > > >>>>> I'd suggest write-locking all VMAs upfront, before doing any kind of fork-mm > > >>>>> operation. Just like the old code did. See below. > > >>>> > > >>>> Calling fork() from a multi-threaded program is fraught with danger. > > >>>> It's a rare thing to do, and we don't need to optimise for it. It > > >>>> does, of course, need to not crash. But we can slow it down as much as > > >>>> we want to. Slowing down single-threaded programs calling fork is > > >>>> much less acceptable. > > >>> > > >>> Hmm. Would you suggest we use different approaches for multi-threaded > > >>> vs single-threaded programs? > > >>> I think locking VMAs while forking a process which has lots of VMAs > > >>> will regress by some amount (we are adding non-zero work). The > > >>> question is if that's acceptable or we have to implement something > > >>> different. I verified that solution fixes the issue shown by the > > >>> reproducer, now I'm trying to quantify this fork performance > > >>> regression I suspect we will introduce. > > >> > > >> Well, the design decision that CONFIG_PER_VMA_LOCK made for now to make > > >> page faults fast and to make blocking any page faults from happening to > > >> be slower (unless there is some easy way that's already built in). > > >> > > >> So it wouldn't surprise me if it might affect performance a bit, but > > >> it's to be quantified if it really matters in comparison to all the page > > >> table copying and other stuff we do during fork. > > >> > > >> Maybe that can be optimized/sped up later. But for now we should fix > > >> this the straightforward way. That fix will be (and has to be) a NOP for > > >> !CONFIG_PER_VMA_LOCK, so that one won't be affected. > > >> > > >> Maybe this patch in an adjusted form would still make sense (also to be > > >> backported), to keep the feature inactive as default until it stabilized > > >> a bit more. > > > > > > Ok, IIUC your suggestion is to use VMA-lock-on-fork fix even if the > > > fork() regresses and keep CONFIG_PER_VMA_LOCK disabled by default > > > until it's more stable. That sounds good to me. With that fix, do we > > > still need to add the BROKEN dependency? I'm guessing it would be > > > safer to disable for sure. > > > > With this fixed, I don't think we need a BROKEN dependency. > > > > I'll let you decide if you want to keep it enabled as default, I'd > > rather disable it for one release and enable it as default later. > > > > Happy so learn if taking all VMA locks without any contention causes a > > lot of harm. > > Ok, average kernel compilation time almost did not change - 0.3% which > is well within noise levels. > My fork test which mmaps 10000 unmergeable vmas and does 5000 forks in > a tight loop shows regression of about 5%. The test was specifically > designed to reveal this regression. This does not seem too much to me > considering that it's unlikely some program would issue 5000 forks. > So, I think the numbers are not bad and I'll prepare the patch to add > VMA locking but will not add BROKEN dependency just yet. If someone is > using an old .config, they probably won't notice the regression and if > they do, the only thing they have to do is to disable > CONFIG_PER_VMA_LOCK. > Of course if we still get reports about memory corruption then I'll > add the BROKEN dependency to disable the feature even for old > .configs. Let me know if that plan does not work for some reason. The fix is posted at https://lore.kernel.org/all/20230704200656.2526715-1-surenb@xxxxxxxxxx/ CC'ing stable for inclusion into 6.4.y stable branch. Folks who reported the problem, could you please test and verify the fix? Thanks, Suren. > Thanks, > Suren. > > > > > -- > > Cheers, > > > > David / dhildenb > >