Re: [Bug 56881] New: MAP_HUGETLB mmap fails for certain sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013/4/24 6:26, Naoya Horiguchi wrote:

> On Tue, Apr 23, 2013 at 01:25:22PM -0700, Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Sat, 20 Apr 2013 03:00:30 +0000 (UTC) bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=56881
>>>
>>>            Summary: MAP_HUGETLB mmap fails for certain sizes
>>>            Product: Memory Management
>>>            Version: 2.5
>>>     Kernel Version: 3.5.0-27
>>
>> Thanks.
>>
>> It's a post-3.4 regression, testcase included.  Does someone want to
>> take a look, please?
> 
> Let me try it.
> 
>   static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>   {                                                                            
>           struct inode *inode = file->f_path.dentry->d_inode;
>           loff_t len, vma_len;                               
>           int ret;                                           
>           struct hstate *h = hstate_file(file);              
>           ...                                                                               
>           if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))              
>                   return -EINVAL;                                              
> 
> This code checks only whether a given hugetlb vma covers (1 << order)
> pages, not whether it's exactly hugepage aligned.
> Before 2b37c35e6552 "fs/hugetlbfs/inode.c: fix pgoff alignment
> checking on 32-bit", it was
> 
>   if (vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT))
> 
> , but this made no sense because ~(huge_page_mask(h) >> PAGE_SHIFT) is
> 0xff for 2M hugepage.
> I think the reported problem is not a bug because the behavior before
> this change was wrong or not as expected.
> 
> If we want to make sure that a given address range fits hugepage size,
> something like below can be useful.
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 78bde32..a98304b 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -113,11 +113,11 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>  	vma->vm_ops = &hugetlb_vm_ops;
>  
> -	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> -		return -EINVAL;
> -
>  	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
>  
> +	if (vma->len & ~huge_page_mask(h))
> +		return -EINVAL;
> +
>  	mutex_lock(&inode->i_mutex);
>  	file_accessed(file);
>  
> 
> Thanks,
> Naoya Horiguchi
> 

Hi Naoya,

I think the -EINVAL is returned from hugetlb_get_unmapped_area(),
for the two testcases:
1) $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))	//len1 = 0x9ff000
2) $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))	//len2 = 0x9ff001

In do_mmap_pgoff(), after "len = PAGE_ALIGN(len);", len1 = 0x9ff000,
len2 = 0xa00000, so len2 will pass "if (len & ~huge_page_mask(h))" check in
hugetlb_get_unmapped_area(), and len1 will return -EINVAL. As follow:

do_mmap_pgoff()
{
	...
	/* Careful about overflows.. */
	len = PAGE_ALIGN(len);
	...
	get_unmapped_area()
		-->hugetlb_get_unmapped_area()
		   {
			...
			if (len & ~huge_page_mask(h))
				return -EINVAL;
			...
		   }
}

do we need to align len to hugepage size if it's hugetlbfs mmap? something like below:

---
 mm/mmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 0db0de1..bd42be24 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1188,7 +1188,10 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 		addr = round_hint_to_min(addr);
 
 	/* Careful about overflows.. */
-	len = PAGE_ALIGN(len);
+	if (file && is_file_hugepages(file))
+		len = ALIGN(len, huge_page_size(hstate_file(file)));
+	else
+		len = PAGE_ALIGN(len);
 	if (!len)
 		return -ENOMEM;
 
-- 

Thanks,
Jianguo Wu

>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: high
>>>           Priority: P1
>>>          Component: Other
>>>         AssignedTo: akpm@xxxxxxxxxxxxxxxxxxxx
>>>         ReportedBy: iceman_dvd@xxxxxxxxx
>>>         Regression: No
>>>
>>>
>>> This is on an Ubuntu 12.10 desktop, but the same issue has been found on 12.04
>>> with 3.5.0 kernel.
>>> See the sample program. An allocation with MAP_HUGETLB consistently fails with
>>> certain sizes, while it succeeds with others.
>>> The allocation sizes are well below the number of free huge pages.
>>>
>>> $ uname -a Linux davide-lnx2 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25
>>> 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> # echo 100 > /proc/sys/vm/nr_hugepages
>>>
>>> # cat /proc/meminfo
>>> ...
>>> AnonHugePages:         0 kB
>>> HugePages_Total:     100
>>> HugePages_Free:      100
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>>
>>>
>>> $ ./mmappu $((5 * 2 * 1024 * 1024 - 4096))
>>> size=10481664    0x9ff000
>>> hugepage mmap: Invalid argument
>>>
>>>
>>> $ ./mmappu $((5 * 2 * 1024 * 1024 - 4095))
>>> size=10481665    0x9ff001
>>> OK!
>>>
>>>
>>> It seems the trigger point is a normal page size.
>>> The same binary works flawlessly in previous kernels.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> 
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]