On Mon, Jul 3, 2023 at 2:45 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Jul 3, 2023 at 6:52 AM Holger Hoffstätte > <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On 2023-07-03 12:47, Jiri Slaby wrote: > > > Cc Jacob Young (from kernel bugzilla) > > > > > > On 30. 06. 23, 19:40, Suren Baghdasaryan wrote: > > >> On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote: > > >>> > > >>> On 30. 06. 23, 10:28, Jiri Slaby wrote: > > >>>> > 2348 > > >>>> clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > > >>>> > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > > >>>> > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > > >>>> > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > > >>>> > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > > >>>> > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > > >>>> > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, > > >>>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > > >>>> > 2370 <... mmap resumed>) = 0x7fca68249000 > > >>>> > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > > >>>> > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > > >>>> > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > > >>>> > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > > >>>> > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished > > >>>> ...> > > >>>> > > >>>> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON > > >>>> 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some > > >>>> reason 0x7fca6824bec8 in that region is "bad". > > >> > > >> Thanks for the analysis Jiri. > > >> Is it possible from these logs to identify whether 2370 finished the > > >> mmap operation before 2395 tried to access 0x7fca6824bec8? That access > > >> has to happen only after mmap finishes mapping the region. > > > > > > Hi, > > > > > > it's hard to tell, but I assume so. > > > > > > For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in: > > > https://bugzilla.kernel.org/show_bug.cgi?id=217624 > > > ;) > > > > > > FWIW, I can reproduce using the test case too. > > Thanks for the reproducer, Jiri! > Let me try it and see if I can figure this one out. Interestingly I can't reproduce it with qemu emulator (reproducer returns 1) but my host machine with the same kernel reproduces it every time. Will try tracing the major code paths to see what's going on. I have to leave for a day but will resume in the evening once I'm home. Thanks, Suren. > > > > > > > thanks, > > > > As another (admittedly correlation-only) data point, I noticed at least hourly crashes > > of Firefox-114 after upgrading to 6.4.1, which had never happened before with 6.3.x. > > After reverting 0bff0aaea03e2a3ed6 - with a bit of context fixup due to follow-up > > commits in 6.4.1 - it has been rock stable again, for several hours now. > > > > cheers > > Holger