Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Wed, 2 Oct 2024 18:19:18 +0100

On Wed, Oct 02, 2024 at 06:13:47PM GMT, Bert Karwatzki wrote:
> Am Mittwoch, dem 02.10.2024 um 14:23 +0100 schrieb Lorenzo Stoakes:
> > On Wed, Oct 02, 2024 at 01:13:16PM GMT, Lorenzo Stoakes wrote:
> > > On Tue, Oct 01, 2024 at 04:34:00AM GMT, Bert Karwatzki wrote:
> > > > I just noticed (via a bisect between v6.11 and v6.12-rc1) that this patch
> > > > (commit f8d112a4e657 in linux-next tree) leads to a severe memory corruption
> > > > error under these (rather rare) circumstances:
> > > > 1. Start a 32bit windows game via steam (which uses proton, steam's version of wine)
> > > > 2. When starting the game you the proton version used has to be updated
> > > >
> > > > The effect is the following: The updating process of proton hangs and the game does
> > > > not start and even after an exit from steam two processes remain, one of them at
> > > > 100% CPU:
> > > > $ ps aux | grep rundll
> > > > bert      222638  1.7  0.1 2054868 87492 ?       Ss   23:14   0:01 C:\windows\syswow64\rundll32.exe setupapi,InstallHinfSection Wow64Install 128 \\?\Z:\mnt\data\.steam\debian-installation\steamapps\common\Proton - Experimental\files\share\wine\wine.inf
> > > > bert      222639 99.8  0.0 2054868 2380 ?        R    23:14   1:01 C:\windows\syswow64\rundll32.exe setupapi,InstallHinfSection Wow64Install 128 \\?\Z:\mnt\data\.steam\debian-installation\steamapps\common\Proton - Experimental\files\share\wine\wine.inf
> > > >
> > > > When trying to kill those processes with "killall rundll32.exe", these error happen:
> > >
> > > [snip]
> > >
> > > Starting a new thread because lei is totally breaking with all these dmesg
> > > logs and I'm struggling to be able to reply correctly.
> > >
> > > Sorry to make it hard to follow everyone but there we go.
> > >
> > > I have tried to recreate the exact series of anon mappings and it is not
> > > triggering for me, so unfortunately I'm going to have to ask you to try
> > > something else.
> > >
> > > This does sort of hint at it being maybe an unusual code path with a file
> > > set (possibly...) - could you try the below patch on fresh next 1st oct?
> > >
> > > You can grep the dmesg for 'LJS' and just provide that if it triggers,
> > > mostly I want to see if this (unusual) code path triggers. There shouldn't
> > > be any spamming.
> > >
> > > Thanks!
> > >
> >
> > [snip]
> >
> > Ugh trying this locally and trying to repro now (and not succeeding
> > unfortunately), and I realise that _does_ spam because apparently it's very
> > common with steam to be call_mmap()'ing things into VM_PFNMAP (who knew).
> >
> > Can you try this instead? Thanks!
> >
> > ----8<----
> > From e69b8c05daa20921bd86ef604982297bd2ced663 Mon Sep 17 00:00:00 2001
> > From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> > Date: Wed, 2 Oct 2024 13:04:55 +0100
> > Subject: [PATCH] ljs: add hacky log output
> >
> > ---
> >  mm/mmap.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..e513eb3721a3 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1463,11 +1463,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  		 * vma again as we may succeed this time.
> >  		 */
> >  		if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
> > +			unsigned long ljs_start = vma->vm_start;
> > +			unsigned long ljs_end = vma->vm_end;
> > +
> >  			vmg.flags = vma->vm_flags;
> > +
> >  			/* If this fails, state is reset ready for a reattempt. */
> >  			merge = vma_merge_new_range(&vmg);
> >
> >  			if (merge) {
> > +				pr_err("LJS: HIT CASE [%lx, %lx) -> [%lx, %lx) orig flags=[%lu] flags=[%lu]\n",
> > +				       ljs_start, ljs_end, vma->vm_start, vma->vm_end, vm_flags, vma->vm_flags);
> > +
> >  				/*
> >  				 * ->mmap() can change vma->vm_file and fput
> >  				 * the original file. So fput the vma->vm_file
> > --
> > 2.46.2
>
> This gives no output.
>
> Bert Karwatzki
>

OK, kind of expected it seems like at this point.

Current status - I litearlly cannot repro this even doing exactly what you're
doing, so I wonder if your exact GPU or a file system you're using or something
is a factor here and there's something which implements a custom .mmap callback
or vm_ops->close() that is somehow interfacing with this, or if this being a
file thing is somehow a factor.

Recreating the scenario as best I can with anon mappings, it seems immediately
before it triggers we are in the position on the left in the range with the
problematic node, and then immediately after we are in the right (plus an
invalid entry/pivot for 0x68000000).

The final action that triggers the problem is mapping [0x1b90000, 0x1bae000)
PROT_NONE, MAP_RESERVE which merges with A and D, and we unmap B and C:

01740000-017c0000 ---p 00000000 00:00 0       01740000-017c0000 ---p 00000000 00:00 0
017c0000-01b40000 rw-p 00000000 00:00 0	      017c0000-01b40000 rw-p 00000000 00:00 0
01b40000-01b50000 ---p 00000000 00:00 0	      01b40000-01b50000 ---p 00000000 00:00 0
01b50000-01b56000 rw-p 00000000 00:00 0	      01b50000-01b56000 rw-p 00000000 00:00 0
01b56000-01b60000 ---p 00000000 00:00 0	      01b56000-01b60000 ---p 00000000 00:00 0
01b60000-01b70000 ---p 00000000 00:00 0	      01b60000-01b70000 ---p 00000000 00:00 0
01b70000-01b80000 ---p 00000000 00:00 0	      01b70000-01b80000 ---p 00000000 00:00 0
01b80000-01b86000 rw-p 00000000 00:00 0	      01b80000-01b86000 rw-p 00000000 00:00 0
01b86000-01b90000 ---p 00000000 00:00 0 * A   01b86000-68000000 ---p 00000000 00:00 0
01b90000-01b91000 rwxp 00000000 00:00 0 * B   < invalid 0x68000000 entry/pivot >
01b91000-01bae000 rw-p 00000000 00:00 0 * C
01bae000-68000000 ---p 00000000 00:00 0 * D

It seems based on some of the VMA flags that we _must_ be mapping files here,
e.g. some have VM_EXEC and others are mising VM_MAYREAD which indicates a
read-only file mapping. Probably given low addresses we are setting up a binary
set of mappings or such? Would align with PROT_NONE mappings also.

This really makes me think, combined with the fact I really _cannot_ repro this
(on intel GPU hardware and ext4 file system) that there are some 'special'
mappings going on here.

The fact we're unmapping 2 VMAs and then removing a final one in a merge does
suggest something is going wrong in the interaction between these two events.

I wonder if the merge logic is possibly struggling with the (detached but
present) VMAs still being there as we try to expand an existing VMA?

Though my money's on a call_mmap() or .close() call doing something weird here.

Investigation carries on...