On 2019-10-28 at 15:14:51 Hugh Dickins wrote: >On Thu, 24 Oct 2019, Vlastimil Babka wrote: > >> + linux-api >> >> On 10/24/19 9:35 AM, Li Xinhai wrote: >> > From: Li Xinhai <xinhai.li@xxxxxxxxxxx> >> > >> > mbind_range silently ignore unmapped hole at middle and tail of the >> > specified range, but report EFAULT if hole at head side. >> >> >> Hmm that's unfortunate. mbind() manpage says: >> >> EFAULT Part or all of the memory range specified by nodemask and maxnode >> points outside your accessible address space. Or, there was an unmapped >> hole in the specified memory range specified by addr and len. >> >> That sounds like any hole inside the specified range should return >> EFAULT. > >Yes (though an exception is allowed when restoring to default). > >> But perhaps it can be also interpreted as you suggest, that the >> whole range is an unmapped hole. There's some risk of breaking existing >> userspace if we change it either way. >> >> > It is more reasonable to support silently ignore holes at any part of >> > the range, only report EFAULT if the whole range is in hole. >> > >> > Signed-off-by: Li Xinhai <xinhai.li@xxxxxxxxxxx> > >Xinhai, I'm sceptical about this patch: is it something you found >by code inspection, or something you found when using mbind()? > I encountered issue when using mbind (my issue was about using nodemask parameter), and then found this special range checking in mbind_range(). >I've not looked long enough to be certain, nor experimented, but: > >mbind_range() is only one stage of the mbind() syscall implementation, >and is preceded by queue_pages_range(): look what queue_pages_test_walk() >does when MPOL_MF_DISCONTIG_OK not set. > >My impression is that mbind_range() is merely correcting an omission >from the checks already made my queue_pages_test_walk() (an odd way >to proceed, I admit: would be better to check initially than later). > >I do think that you should not make this change without considering >MPOL_MF_DISCONTIG_OK and its intention. > >Hugh > A program was used to reveal issues as below. #include <stddef.h> #include <sys/mman.h> #include <numaif.h> int main(int argc, char *argv[]) { void *mapAddr; unsigned long nodemask; mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0); // BIND and leave 2 pages as hole in the middle nodemask = 0x1; mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0); munmap(mapAddr+2*(1<<12), 2*(1<<12)); // part 1 mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0); mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0); // part 2 nodemask = 0x2; mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0); mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0); mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0); mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0); return 0; } syscall results: 83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000 84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0 85 munmap(0x7fbd24e15000, 8192) = 0 // part 1 86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address) 87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0 // part 2 88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address) 89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0 90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address) 91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0 The results on line 86 and line 89 were not correct (other lines were expected): line 86: hole at head side of range was reported as error; this should sucess for MPOL_DEFAULT; line 89: hole at tail side of range was reported as success; this should fail for !MPOL_DEFAULT cases; My patch only corrected line 86 case, but didn't handle line 89 case. It is better to detect valid or invalid hole for MPOL_DEFAULT and !MPOL_DEFAULT cases in queue_pages_range phase. New patch will be prepared, and fullfill the linux API description: 1. for MPOL_DEFAULT, hole at any part of specified range is allowed; 2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed. Xinhai (BTW, I am adding two more mail accounts of mine to check which is best for this mailling list...) >> > --- >> > >> > mm/mempolicy.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > >> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> > index 4ae967bcf954..ae160d9936d9 100644 >> > --- a/mm/mempolicy.c >> > +++ b/mm/mempolicy.c >> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start, >> > unsigned long vmend; >> > >> > vma = find_vma(mm, start); >> > - if (!vma || vma->vm_start > start) >> > + if (!vma || vma->vm_start >= end) >> > return -EFAULT; >> > >> > prev = vma->vm_prev; >> > >