On Mon, Jul 3, 2023 at 8:24 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Jul 3, 2023 at 2:45 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > On Mon, Jul 3, 2023 at 6:52 AM Holger Hoffstätte > > <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On 2023-07-03 12:47, Jiri Slaby wrote: > > > > Cc Jacob Young (from kernel bugzilla) > > > > > > > > On 30. 06. 23, 19:40, Suren Baghdasaryan wrote: > > > >> On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote: > > > >>> > > > >>> On 30. 06. 23, 10:28, Jiri Slaby wrote: > > > >>>> > 2348 > > > >>>> clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > > > >>>> > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > > > >>>> > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > > > >>>> > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > > > >>>> > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > > > >>>> > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > > > >>>> > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, > > > >>>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > > > >>>> > 2370 <... mmap resumed>) = 0x7fca68249000 > > > >>>> > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > > > >>>> > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > > > >>>> > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > > > >>>> > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > > > >>>> > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished > > > >>>> ...> > > > >>>> > > > >>>> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON > > > >>>> 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some > > > >>>> reason 0x7fca6824bec8 in that region is "bad". > > > >> > > > >> Thanks for the analysis Jiri. > > > >> Is it possible from these logs to identify whether 2370 finished the > > > >> mmap operation before 2395 tried to access 0x7fca6824bec8? That access > > > >> has to happen only after mmap finishes mapping the region. > > > > > > > > Hi, > > > > > > > > it's hard to tell, but I assume so. > > > > > > > > For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=217624 > > > > ;) > > > > > > > > FWIW, I can reproduce using the test case too. > > > > Thanks for the reproducer, Jiri! > > Let me try it and see if I can figure this one out. > > Interestingly I can't reproduce it with qemu emulator (reproducer > returns 1) but my host machine with the same kernel reproduces it > every time. Will try tracing the major code paths to see what's going > on. > I have to leave for a day but will resume in the evening once I'm home. I posted a patch to disable per-VMA locks by default for now: https://lore.kernel.org/all/20230703182150.2193578-1-surenb@xxxxxxxxxx/ Will re-enable them once we figure this issue out. Thanks, Suren. > Thanks, > Suren. > > > > > > > > > > > thanks, > > > > > > As another (admittedly correlation-only) data point, I noticed at least hourly crashes > > > of Firefox-114 after upgrading to 6.4.1, which had never happened before with 6.3.x. > > > After reverting 0bff0aaea03e2a3ed6 - with a bit of context fixup due to follow-up > > > commits in 6.4.1 - it has been rock stable again, for several hours now. > > > > > > cheers > > > Holger