Re: [PATCH 1/2] mm/vma: fix gap check for unmapped_area with VM_GROWSDOWN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You have a subject line of 'fix gap check for unmapped_area with
VM_GROWSDOWN'. I'm not sure this is quite accurate.

I don't really have time to do a deep dive (again, this is why it's so
important to give a decent commit message - explaining under what _real
world_ circumstances this will be used etc.).

But anyway, it seems it will only be the case if MMF_TOPDOWN is not set in
the mm flags, which usually requires an mmap compatibility mode to achieve
unless the arch otherwise forces it.

And these arches would be ones where the stack grows UP, right? Or at least
ones where this is possible?

So already we're into specifics - either arches that grow the stack up, or
ones that intentionally use the old mmap compatibility mode are affected.

This happens in:

[ pretty much all unmapped area callers ]
-> vm_unmapped_area()
-> unmapped_area() (if !(info->flags & VM_UNMAPPED_AREA_TOPDOWN)

Where VM_UNMAPPED_AREA_TOPDOWN is only not set in the circumstances
mentioned above.

So, for this issue you claim is the case to happen, you have to:

1. Either be using a stack grows up arch, or enabling an mmap()
compatibility mode.
2. Also set MAP_GROWSDOWN on the mmap() call, which is translated to
VM_GROWSDOWN.

We are already far from 'fix gap check for VM_GROWSDOWN' right? I mean I
don't have the time to absolutely dive into the guts of this, but I assume
this is correct right?

I'm not saying we shouldn't address this, but it's VITAL to clarify what
exactly it is you're tackling.

On Mon, Jan 27, 2025 at 07:55:26AM +0000, Wei Yang wrote:
> Current unmapped_area() may fail to get the available area.
>
> For example, we have a vma range like below:
>
>     0123456789abcdef
>       m  m  A m  m

I don't understand this diagram at all. What is going on here?  What is 'm'
what is 'A' what are these values? page offsets * 0x1000?

is that a page of memory allocated at each 'm'? Is A somehow an address
under consideration?

You _have_ to add a key and explanation here, my mind reading skills are
much diminished in my old age... :P

>
> Let's assume start_gap is 0x2000 and stack_guard_gap is 0x1000. And now
> we are looking for free area with size 0x1000 within [0x2000, 0xd000].

How can start_gap be 0x2000 when it is only ever 0x1000 at most and only
applicable in x86-64 anyway?

>

It'd be good if you'd shown this on the diagram somehow?

Like this:

  <--------->
0123456789abcdef
  m  m  A m  m

But then I'm confused as to what A is once again.

Ideally you'd actually provide what the struct vm_unmapped_area_info fields
actually are with other parameters and _work through_ an example.

Also you're now talking about a stack but you haven't mentioned the word
'stack' anywhere in any part of this series afaict.

'Fix case where the arch grows stacks upwards or we are in legacy mmap mode
but still want to map a grows-down stack' is a LOT more specific than 'fix
unmapped_area()'.

> The unmapped_area_topdown() could find address at 0x8000, while
> unmapped_area() fails.

OK you need to WORK THROUGH why this is. You're putting all the work on me
as a reviewer to go check to see if this is indeed the case. It's not a
fair distribution of work.

>
> In original code before commit 3499a13168da ("mm/mmap: use maple tree
> for unmapped_area{_topdown}"), the logic is:
>
>   * find a gap with total length, including alignment
>   * adjust start address with alignment
>
> What we do now is:
>
>   * find a gap with total length, including alignment
>   * adjust start address with alignment
>   * then compare the left range with total length

What is 'left range'? This explanation is really hard to follow.

>
> This is not necessary to compare with total length again after start
> address adjusted.
>
> Also, it is not correct to minus 1 here. This may not trigger an issue
> in real world, since address are usually aligned with page size.

You aren't saying why.

Also the VMA's start is _always_ page-aligned.

Presumably the minus 1 is intentionally to amke it an inclusive value once
offset by length?

>
> Fixes: 58c5d0d6d522 ("mm/mmap: regression fix for unmapped_area{_topdown}")

Fixes it how?

> Signed-off-by: Wei Yang <richard.weiyang@xxxxxxxxx>
> CC: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> CC: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> CC: Vlastimil Babka <vbabka@xxxxxxx>
> CC: Jann Horn <jannh@xxxxxxxxxx>
> CC: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---
>  mm/vma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 3f45a245e31b..d82fdbc710b0 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2668,7 +2668,7 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
>  	gap += (info->align_offset - gap) & info->align_mask;
>  	tmp = vma_next(&vmi);
>  	if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
> -		if (vm_start_gap(tmp) < gap + length - 1) {
> +		if (vm_start_gap(tmp) < gap + info->length) {

Have already spent all morning on this :) Sigh.

OK so let's expand this (again - this is the kind of thing you should do
for a tricky change like this).

info->start_gap is set based on stack_guard_placement() and is either
PAGE_SIZE (0x1000) if VM_SHADOW_STACK is set or 0. This is only applicable
in x86-64.

The align mask is likely to be PAGE_SIZE - 1 but can vary.

length = info->length + align_mask + start_gap

This takes into account the worst possible alignment overhead accounting for any following VMA also.

gap is equal to the current start of the range under consideration (via
vma_iter_addr) and as well institutes the appropriate alignment and
accounts for start_gap for any _prior_ VMA.

Then we try to find the vm_start_gap() for a candidate _next_ VMA, which if
a stack, uses stack_guard_gap() to SUBTRACT the stack guard gap from
vma->vm_start or account for the shadow stack.

Then we finally have:

if (next->vm_start - stack gap < start_of_range + [preceeding gap/alignment requirements] + [worst case length] - 1) {
   Try again
}

Or:

start_of_range >= next->vm_start - stack gap + [preceeding gap/alignment requirements] + [worst case length] + 1

start of range
  v
  |[preceeding gap][ VMA, worst length ][following gap]<stack gap>

Surely the + 1 (which is the -1 in the original calculation) is simply
accounting for the fact that the start of the range is _inclusive_?

You're proposing eliminating entirely the after-this-VMA requirements... why?

This just seems incorrect to me?

You really need to argue the case better if this has some validity, otherwise I think it's just wrong?

>  			low_limit = tmp->vm_end;
>  			vma_iter_reset(&vmi);
>  			goto retry;
> --
> 2.34.1
>




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux