On Wed 12-10-16 00:53:49, Kirill A. Shutemov wrote: > On Tue, Oct 11, 2016 at 05:58:15PM +0200, Jan Kara wrote: > > On Thu 15-09-16 14:54:55, Kirill A. Shutemov wrote: > > > invalidate_inode_page() has expectation about page_count() of the page > > > -- if it's not 2 (one to caller, one to radix-tree), it will not be > > > dropped. That condition almost never met for THPs -- tail pages are > > > pinned to the pagevec. > > > > > > Let's drop them, before calling invalidate_inode_page(). > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > > --- > > > mm/truncate.c | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/mm/truncate.c b/mm/truncate.c > > > index a01cce450a26..ce904e4b1708 100644 > > > --- a/mm/truncate.c > > > +++ b/mm/truncate.c > > > @@ -504,10 +504,21 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, > > > /* 'end' is in the middle of THP */ > > > if (index == round_down(end, HPAGE_PMD_NR)) > > > continue; > > > + /* > > > + * invalidate_inode_page() expects > > > + * page_count(page) == 2 to drop page from page > > > + * cache -- drop tail pages references. > > > + */ > > > + get_page(page); > > > + pagevec_release(&pvec); > > > > I'm not quite sure why this is needed. When you have multiorder entry in > > the radix tree for your huge page, then you should not get more entries in > > the pagevec for your huge page. What do I miss? > > For compatibility reason find_get_entries() (which is called by > pagevec_lookup_entries()) collects all subpages of huge page in the range > (head/tails). See patch [07/41] > > So huge page, which is fully in the range it will be pinned up to > PAGEVEC_SIZE times. Yeah, I see. But then won't it be cleaner to provide iteration method that would add to pagevec each radix tree entry (regardless of its order) only once and then use it in places where we care? Instead of strange dances like you do here? Ultimately we could convert all the places to use these new iteration methods but I don't see that as immediately necessary and maybe there are places where getting all the subpages in the pagevec actually makes life simpler for us (please point me if you know about such place). On a somewhat unrelated note: I've noticed that you don't invalidate a huge page when only part of it should be invalidated. That actually breaks some assumptions filesystems make. In particular direct IO code assumes that if you do filemap_write_and_wait_range(inode, start, end); invalidate_inode_pages2_range(inode, start, end); all the page cache covering start-end *will* be invalidated. Your skipping of partial pages breaks this assumption and thus can bring consistency issues (e.g. write done using direct IO won't be seen by following buffered read). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html