Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-07-06 00:15, Suren Baghdasaryan wrote:
On Mon, Jul 3, 2023 at 6:52 AM Holger Hoffstätte
<holger@xxxxxxxxxxxxxxxxxxxxxx> wrote:

On 2023-07-03 12:47, Jiri Slaby wrote:
Cc Jacob Young (from kernel bugzilla)

On 30. 06. 23, 19:40, Suren Baghdasaryan wrote:
On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@xxxxxxxxxx> wrote:

On 30. 06. 23, 10:28, Jiri Slaby wrote:
   > 2348
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351
   > 2350  <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372
   > 2351  <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354
   > 2351  <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357
   > 2354  <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355
   > 2355  <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370
   > 2370  mmap(NULL, 262144, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
   > 2370  <... mmap resumed>)               = 0x7fca68249000
   > 2372  <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384
   > 2384  <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388
   > 2388  <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392
   > 2392  <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395
   > 2395  write(2, "runtime: marked free object in s"..., 36 <unfinished
...>

I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON
0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some
reason 0x7fca6824bec8 in that region is "bad".

Thanks for the analysis Jiri.
Is it possible from these logs to identify whether 2370 finished the
mmap operation before 2395 tried to access 0x7fca6824bec8? That access
has to happen only after mmap finishes mapping the region.

Hi,

it's hard to tell, but I assume so.

For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in:
   https://bugzilla.kernel.org/show_bug.cgi?id=217624
;)

FWIW, I can reproduce using the test case too.

thanks,

As another (admittedly correlation-only) data point, I noticed at least hourly crashes
of Firefox-114 after upgrading to 6.4.1, which had never happened before with 6.3.x.
After reverting 0bff0aaea03e2a3ed6 - with a bit of context fixup due to follow-up
commits in 6.4.1 - it has been rock stable again, for several hours now.

Jiri, Holger, would you be able to try
https://lore.kernel.org/all/20230705171213.2843068-2-surenb@xxxxxxxxxx/
and see if your issues still exist?

Just in time! Not 2 minutes ago I finished rebuilding 6.4.2 + the last version of
your patches on a second machine (old Intel Sandy Bridge workstation) to be my
crash test dummy. I removed the BROKEN dependency in mm/Kconfig, manually set
PER_VMA_LOCK=y and ... it seems to work?! Boots fine, Firefox seems to work
(but no exhaustive tests yet). I will also rerun a few reboot laps, just to
exercise this a bit harder and see if something comes up.

Tomorrow I'll also try again on my Zen2 Thinkpad and will report back.

Fingers crossed!

cheers
Holger




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux