Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size.  This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
> 
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
> 
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file.  It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
> 
> thp_get_unmapped_area() can be extended for huge page cache support.
> 
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.

See Hugh's implementation:

http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils

> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> Cc: Jan Kara <jack@xxxxxxx>
> Cc: Theodore Ts'o <tytso@xxxxxxx>
> Cc: Andreas Dilger <adilger.kernel@xxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> ---
>  include/linux/huge_mm.h |    7 +++++++
>  mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7008623..3769674 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>  
>  extern unsigned long transparent_hugepage_flags;
>  
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> +		unsigned long addr, unsigned long len, unsigned long pgoff,
> +		unsigned long flags);
> +
>  extern void prep_transhuge_page(struct page *page);
>  extern void free_transhuge_page(struct page *page);
>  
> @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
>  #define transparent_hugepage_enabled(__vma) 0
>  
>  #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area	NULL
> +
>  static inline int
>  split_huge_page_to_list(struct page *page, struct list_head *list)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..2181c7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
>  	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
>  }
>  
> +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> +		loff_t off, unsigned long flags, unsigned long size)
> +{
> +	unsigned long addr;
> +	loff_t off_end = off + len;
> +	loff_t off_align = round_up(off, size);
> +	unsigned long len_pad;
> +
> +	if (off_end <= off_align || (off_end - off_align) < size)
> +		return 0;
> +
> +	len_pad = len + size;
> +	if (len_pad < len || (off + len_pad) < off)
> +		return 0;
> +
> +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> +					      off >> PAGE_SHIFT, flags);
> +	if (IS_ERR_VALUE(addr))
> +		return 0;
> +
> +	addr += (off - addr) & (size - 1);
> +	return addr;

Hugh has more sanity checks before and after call to get_unmapped_area().
Please, consider borrowing them.

> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> +
> +	if (addr)
> +		goto out;

I think it's too strong reaction to hint, isn't it?
We definately need this for MAP_FIXED. But in general? Maybe.

> +	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
> +		goto out;
> +
> +	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
> +	if (addr)
> +		return addr;
> +
> + out:
> +	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
> +EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> +
>  static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long address, pmd_t *pmd,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux