Re: [PATCH] mm: allow unmapped hole at head side of mbind range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-10-28 at 15:14:51 Hugh Dickins wrote:
>On Thu, 24 Oct 2019, Vlastimil Babka wrote:
>
>> + linux-api
>>
>> On 10/24/19 9:35 AM, Li Xinhai wrote:
>> > From: Li Xinhai  <xinhai.li@xxxxxxxxxxx>
>> >
>> > mbind_range silently ignore unmapped hole at middle and tail of the
>> > specified range, but report EFAULT if hole at head side.
>>
>>
>> Hmm that's unfortunate. mbind() manpage says:
>>
>> EFAULT Part or all of the memory range specified by nodemask and maxnode
>> points outside your accessible address space.  Or, there was an unmapped
>> hole in  the  specified  memory range specified by addr and len.
>>
>> That sounds like any hole inside the specified range should return
>> EFAULT.
>
>Yes (though an exception is allowed when restoring to default).
>
>> But perhaps it can be also interpreted as you suggest, that the
>> whole range is an unmapped hole.  There's some risk of breaking existing
>> userspace if we change it either way.
>>
>> > It is more reasonable to support silently ignore holes at any part of
>> > the range, only report EFAULT if the whole range is in hole.
>> >
>> > Signed-off-by: Li Xinhai <xinhai.li@xxxxxxxxxxx>
>
>Xinhai, I'm sceptical about this patch: is it something you found
>by code inspection, or something you found when using mbind()?
> 
I encountered issue when using mbind (my issue was about using nodemask 
parameter), and then found this special range checking in mbind_range().

>I've not looked long enough to be certain, nor experimented, but:
>
>mbind_range() is only one stage of the mbind() syscall implementation,
>and is preceded by queue_pages_range(): look what queue_pages_test_walk()
>does when MPOL_MF_DISCONTIG_OK not set.
>
>My impression is that mbind_range() is merely correcting an omission
>from the checks already made my queue_pages_test_walk() (an odd way
>to proceed, I admit: would be better to check initially than later).
>
>I do think that you should not make this change without considering
>MPOL_MF_DISCONTIG_OK and its intention.
>
>Hugh
> 

A program was used to reveal issues as below.

#include <stddef.h>
#include <sys/mman.h>
#include <numaif.h>

int main(int argc, char *argv[])
{
    void *mapAddr;
    unsigned long nodemask;

    mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE|
        MAP_ANONYMOUS, -1, 0);

    // BIND and leave 2 pages as hole in the middle
    nodemask = 0x1;
    mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0);
    munmap(mapAddr+2*(1<<12), 2*(1<<12));

    // part 1
    mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
    mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0);

    // part 2
    nodemask = 0x2;
    mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);

    return 0;
}

syscall results:
     83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000
     84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0
     85 munmap(0x7fbd24e15000, 8192)            = 0
// part 1
     86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address)
     87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0
// part 2
     88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
     89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
     90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
     91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0

The results on line 86 and line 89 were not correct (other lines were expected):
line 86: hole at head side of range was reported as error; this should 
  sucess for MPOL_DEFAULT;
line 89: hole at tail side of range was reported as success; this should 
  fail for !MPOL_DEFAULT cases;
  
My patch only corrected line 86 case, but didn't handle line 89 case. It 
is better to detect valid or invalid hole for MPOL_DEFAULT and 
!MPOL_DEFAULT cases in queue_pages_range phase.

New patch will be prepared, and fullfill the linux API description:
1. for MPOL_DEFAULT, hole at any part of specified range is allowed;
2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed.

Xinhai
(BTW, I am adding two more mail accounts of mine to check which is best for 
this mailling list...)

>> > ---
>> >
>> >  mm/mempolicy.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> >
>> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> > index 4ae967bcf954..ae160d9936d9 100644
>> > --- a/mm/mempolicy.c
>> > +++ b/mm/mempolicy.c
>> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>> >         unsigned long vmend;
>> > 
>> >         vma = find_vma(mm, start);
>> > -       if (!vma || vma->vm_start > start)
>> > +       if (!vma || vma->vm_start >= end)
>> >                 return -EFAULT;
>> > 
>> >         prev = vma->vm_prev;
>> >
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux