On Mon, Jul 3, 2023 at 6:52 AM Holger Hoffstätte <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On 2023-07-03 12:47, Jiri Slaby wrote: > > Cc Jacob Young (from kernel bugzilla) > > > > On 30. 06. 23, 19:40, Suren Baghdasaryan wrote: > >> On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote: > >>> > >>> On 30. 06. 23, 10:28, Jiri Slaby wrote: > >>>> > 2348 > >>>> clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > >>>> > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > >>>> > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > >>>> > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > >>>> > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > >>>> > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > >>>> > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, > >>>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > >>>> > 2370 <... mmap resumed>) = 0x7fca68249000 > >>>> > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > >>>> > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > >>>> > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > >>>> > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > >>>> > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished > >>>> ...> > >>>> > >>>> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON > >>>> 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some > >>>> reason 0x7fca6824bec8 in that region is "bad". > >> > >> Thanks for the analysis Jiri. > >> Is it possible from these logs to identify whether 2370 finished the > >> mmap operation before 2395 tried to access 0x7fca6824bec8? That access > >> has to happen only after mmap finishes mapping the region. > > > > Hi, > > > > it's hard to tell, but I assume so. > > > > For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in: > > https://bugzilla.kernel.org/show_bug.cgi?id=217624 > > ;) > > > > FWIW, I can reproduce using the test case too. > > > > thanks, > > As another (admittedly correlation-only) data point, I noticed at least hourly crashes > of Firefox-114 after upgrading to 6.4.1, which had never happened before with 6.3.x. > After reverting 0bff0aaea03e2a3ed6 - with a bit of context fixup due to follow-up > commits in 6.4.1 - it has been rock stable again, for several hours now. Jiri, Holger, would you be able to try https://lore.kernel.org/all/20230705171213.2843068-2-surenb@xxxxxxxxxx/ and see if your issues still exist? > > cheers > Holger