On Sun, Dec 27, 2015 at 12:46 AM, Bob Liu <lliubbo@xxxxxxxxx> wrote: > On Mon, Dec 21, 2015 at 1:45 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: >> get_dev_page() enables paths like get_user_pages() to pin a dynamically >> mapped pfn-range (devm_memremap_pages()) while the resulting struct page >> objects are in use. Unlike get_page() it may fail if the device is, or >> is in the process of being, disabled. While the initial lookup of the >> range may be an expensive list walk, the result is cached to speed up >> subsequent lookups which are likely to be in the same mapped range. >> >> devm_memremap_pages() now requires a reference counter to be specified >> at init time. For pmem this means moving request_queue allocation into >> pmem_alloc() so the existing queue usage counter can track "device >> pages". >> >> ZONE_DEVICE pages always have an elevated count and will never be on an >> lru reclaim list. That space in 'struct page' can be redirected for >> other uses, but for safety introduce a poison value that will always >> trip __list_add() to assert. This allows half of the struct list_head >> storage to be reclaimed with some assurance to back up the assumption >> that the page count never goes to zero and a list_add() is never >> attempted. >> >> Cc: Dave Hansen <dave@xxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxxxx> >> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> >> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> >> Tested-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> >> --- [..] >> +static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn, >> + struct dev_pagemap *pgmap) >> +{ >> + const struct resource *res = pgmap ? pgmap->res : NULL; >> + resource_size_t phys = PFN_PHYS(pfn); >> + >> + /* >> + * In the cached case we're already holding a live reference so >> + * we can simply do a blind increment >> + */ >> + if (res && phys >= res->start && phys <= res->end) { >> + percpu_ref_get(pgmap->ref); >> + return pgmap; >> + } >> + >> + /* fall back to slow path lookup */ >> + rcu_read_lock(); >> + pgmap = find_dev_pagemap(phys); > > Is it possible just use pfn_to_page() and then return page->pgmap? > Then we can get rid of the pgmap_radix tree totally. No, for two reasons: 1/ find_dev_pagemap() is used in places where pfn_to_page() is not yet established (see: to_vmem_altmap()) 2/ at shutdown, new get_dev_pagemap() requests can race the memmap being torn down. So, unless we already have a reference against the page_map, we always need to look it up under a lock to know that pfn_to_page() is returning a valid page. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>