On Wed, Jul 5, 2023 at 3:37 PM Holger Hoffstätte <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On 2023-07-06 00:15, Suren Baghdasaryan wrote: > > On Mon, Jul 3, 2023 at 6:52 AM Holger Hoffstätte > > <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >> > >> On 2023-07-03 12:47, Jiri Slaby wrote: > >>> Cc Jacob Young (from kernel bugzilla) > >>> > >>> On 30. 06. 23, 19:40, Suren Baghdasaryan wrote: > >>>> On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote: > >>>>> > >>>>> On 30. 06. 23, 10:28, Jiri Slaby wrote: > >>>>>> > 2348 > >>>>>> clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > >>>>>> > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > >>>>>> > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > >>>>>> > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > >>>>>> > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > >>>>>> > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > >>>>>> > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, > >>>>>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > >>>>>> > 2370 <... mmap resumed>) = 0x7fca68249000 > >>>>>> > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > >>>>>> > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > >>>>>> > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > >>>>>> > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > >>>>>> > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished > >>>>>> ...> > >>>>>> > >>>>>> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON > >>>>>> 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some > >>>>>> reason 0x7fca6824bec8 in that region is "bad". > >>>> > >>>> Thanks for the analysis Jiri. > >>>> Is it possible from these logs to identify whether 2370 finished the > >>>> mmap operation before 2395 tried to access 0x7fca6824bec8? That access > >>>> has to happen only after mmap finishes mapping the region. > >>> > >>> Hi, > >>> > >>> it's hard to tell, but I assume so. > >>> > >>> For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in: > >>> https://bugzilla.kernel.org/show_bug.cgi?id=217624 > >>> ;) > >>> > >>> FWIW, I can reproduce using the test case too. > >>> > >>> thanks, > >> > >> As another (admittedly correlation-only) data point, I noticed at least hourly crashes > >> of Firefox-114 after upgrading to 6.4.1, which had never happened before with 6.3.x. > >> After reverting 0bff0aaea03e2a3ed6 - with a bit of context fixup due to follow-up > >> commits in 6.4.1 - it has been rock stable again, for several hours now. > > > > Jiri, Holger, would you be able to try > > https://lore.kernel.org/all/20230705171213.2843068-2-surenb@xxxxxxxxxx/ > > and see if your issues still exist? > > Just in time! Not 2 minutes ago I finished rebuilding 6.4.2 + the last version of > your patches on a second machine (old Intel Sandy Bridge workstation) to be my > crash test dummy. I removed the BROKEN dependency in mm/Kconfig, manually set > PER_VMA_LOCK=y and ... it seems to work?! Boots fine, Firefox seems to work > (but no exhaustive tests yet). I will also rerun a few reboot laps, just to > exercise this a bit harder and see if something comes up. > > Tomorrow I'll also try again on my Zen2 Thinkpad and will report back. > > Fingers crossed! Thanks! This is promising. > > cheers > Holger