On Fri, Sep 06, 2019 at 06:41:45AM -0700, Matthew Wilcox wrote: > On Fri, Sep 06, 2019 at 03:59:28PM +0300, Kirill A. Shutemov wrote: > > > +/** > > > + * __find_get_page - Find and get a page cache entry. > > > + * @mapping: The address_space to search. > > > + * @offset: The page cache index. > > > + * @order: The minimum order of the entry to return. > > > + * > > > + * Looks up the page cache entries at @mapping between @offset and > > > + * @offset + 2^@order. If there is a page cache page, it is returned with > > > > Off by one? :P > > Hah! I thought it reasonable to be ambiguous in the English description > ... it's not entirely uncommon to describe something being 'between A > and B' when meaning ">= A and < B". It is reasable. I was just a nitpick. > > > + if (compound_order(page) < order) { > > > + page = XA_RETRY_ENTRY; > > > + goto out; > > > + } > > > > compound_order() is not stable if you don't have pin on the page. > > Check it after page_cache_get_speculative(). > > Maybe check both before and after? If we check it before, we don't bother > to bump the refcount on a page which is too small. Makes sense. False-positives should be rare enough to ignore them. > > > @@ -1632,6 +1696,10 @@ EXPORT_SYMBOL(find_lock_entry); > > > * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do > > > * its own locking dance if the page is already in cache, or unlock the page > > > * before returning if we had to add the page to pagecache. > > > + * - FGP_PMD: We're only interested in pages at PMD granularity. If there > > > + * is no page here (and FGP_CREATE is set), we'll create one large enough. > > > + * If there is a smaller page in the cache that overlaps the PMD page, we > > > + * return %NULL and do not attempt to create a page. > > > > Is it really the best inteface? > > > > Maybe allow user to ask bitmask of allowed orders? For THP order-0 is fine > > if order-9 has failed. > > That's the semantics that filemap_huge_fault() wants. If the page isn't > available at order-9, it needs to return VM_FAULT_FALLBACK (and the VM > will call into filemap_fault() to handle the regular sized fault). Ideally, we should not have division between ->fault and ->huge_fault. Integrating them together will give a shorter fallback loop and more flexible inteface here would give benefit. But I guess it's out-of-scope of the patchset. -- Kirill A. Shutemov