Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

We've been doing some experimenting and testing with this patchset.
Specifically, we are trying to use you're ZONE_DEVICE work to enable
peer to peer PCIe transfers. This is actually working pretty well
(though we're still testing and working through some things).

However, we've found a couple of issues:

On Wed, Sep 23, 2015 at 12:42:27AM -0400, Dan Williams wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3d6baa7d4534..20097e7b679a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -49,12 +49,16 @@ struct page {
>  					 * updated asynchronously */
>  	union {
>  		struct address_space *mapping;	/* If low bit clear, points to
> -						 * inode address_space, or NULL.
> +						 * inode address_space, unless
> +						 * the page is in ZONE_DEVICE
> +						 * then it points to its parent
> +						 * dev_pagemap, otherwise NULL.
>  						 * If page mapped as anonymous
>  						 * memory, low bit is set, and
>  						 * it points to anon_vma object:
>  						 * see PAGE_MAPPING_ANON below.
>  						 */
> +		struct dev_pagemap *pgmap;
>  		void *s_mem;			/* slab first object */
>  	};


When you add to this union and overide the mapping value, we see bugs
in calls to set_page_dirty when it tries to dereference mapping. I believe
a change to page_mapping is required such as the patch that's at the end of
this email.


> diff --git a/mm/gup.c b/mm/gup.c
> index a798293fc648..1064e9a489a4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -98,7 +98,16 @@ retry:
>  	}
>
>  	page = vm_normal_page(vma, address, pte);
> -	if (unlikely(!page)) {
> +	if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
> +		/*
> +		 * Only return device mapping pages in the FOLL_GET case since
> +		 * they are only valid while holding the pgmap reference.
> +		 */
> +		if (get_dev_pagemap(pte_pfn(pte), NULL))
> +			page = pte_page(pte);
> +		else
> +			goto no_page;
> +	} else if (unlikely(!page)) {

I've found that if a driver creates a ZONE_DEVICE mapping but doesn't
create the pagemap (using devm_register_pagemap) then the get_user_pages code
will go into an infinite loop. I'm not really sure if this as an issue or
not but it seems a bit undesirable for a buggy driver to be able to cause this.

My thoughts are that either devm_register_pagemap needs to be done by
devm_memremap_pages so a driver cannot use one without the other,
or the GUP code needs to return EFAULT if no pagemap was registered so
it doesn't loop forever.

Thanks!

Logan



diff --git a/mm/util.c b/mm/util.c
index 68ff8a5..19af683 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -368,6 +368,9 @@ struct address_space *page_mapping(struct page *page)
 		return swap_address_space(entry);
 	}

+	if (unlikely(is_zone_device_page(page)))
+		return NULL;
+
 	mapping = (unsigned long)page->mapping;
 	if (mapping & PAGE_MAPPING_FLAGS)
 		return NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]