Re: [PATCH v4 00/16] Overhaul multi-page lookups for THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 25, 2020 at 3:09 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 25 Nov 2020 02:32:34 +0000 Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> > On Tue, Nov 17, 2020 at 11:43:02PM +0000, Matthew Wilcox wrote:
> > > On Tue, Nov 17, 2020 at 07:15:13PM +0000, Matthew Wilcox wrote:
> > > > I find both of these functions exceptionally confusing.  Does this
> > > > make it easier to understand?
> > >
> > > Never mind, this is buggy.  I'll send something better tomorrow.
> >
> > That took a week, not a day.  *sigh*.  At least this is shorter.
> >
> > commit 1a02863ce04fd325922d6c3db6d01e18d55f966b
> > Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > Date:   Tue Nov 17 10:45:18 2020 -0500
> >
> >     fix mm-truncateshmem-handle-truncates-that-split-thps.patch
> >
>
> That's a big patch.  Big enough to put
> mm-truncateshmem-handle-truncates-that-split-thps.patch back into
> unreviewed state, methinks.  And big enough to have a changelog!
>
> Below is the folded-together result for reviewers, please.

I have not reviewed the changes at all, but have been testing.

Responding hastily in gmail, which will probably garble the result
(sorry): because I think you may be working towards another mmotm, and
there's one little fix definitely needed, but the machine I usually
mail patches from is in a different hang (running with this patch)
that I need to examine before rebooting - but probably not something
that the average user will ever encounter.

In general, this series behaves a lot better than it did nine days
ago: LTP results on shmem huge pages match how they were before the
series, only one of xfstests fails which did not fail before
(generic/539 - not yet analysed, may be of no importance), and until
that hang there were no problems seen in running my tmpfs swapping
loads.

Though I did see generic/476 stuck in shmem_undo_range() on one
machine, and will need to reproduce and investigate that.

The little fix definitely needed was shown by generic/083: each
fsstress waiting for page lock, happens even without forcing huge
pages. See below...

> Is the changelog still accurate and complete?
>
>
> From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
> Subject: mm/truncate,shmem: handle truncates that split THPs
>
> Handle THP splitting in the parts of the truncation functions which
> already handle partial pages.  Factor all that code out into a new
> function called truncate_inode_partial_page().
>
> We lose the easy 'bail out' path if a truncate or hole punch is entirely
> within a single page.  We can add some more complex logic to restore the
> optimisation if it proves to be worthwhile.
>
> [willy@xxxxxxxxxxxxx: fix]
>   Link: https://lkml.kernel.org/r/20201125023234.GH4327@xxxxxxxxxxxxxxxxxxxx
> Link: https://lkml.kernel.org/r/20201112212641.27837-16-willy@xxxxxxxxxxxxx
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> Reviewed-by: William Kucharski <william.kucharski@xxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Dave Chinner <dchinner@xxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
>  mm/internal.h |    2
>  mm/shmem.c    |  137 +++++++++++++----------------------------
>  mm/truncate.c |  157 +++++++++++++++++++++++-------------------------
>  3 files changed, 122 insertions(+), 174 deletions(-)
>
> --- a/mm/internal.h~mm-truncateshmem-handle-truncates-that-split-thps
> +++ a/mm/internal.h
> @@ -623,4 +623,6 @@ struct migration_target_control {
>         gfp_t gfp_mask;
>  };
>
> +pgoff_t truncate_inode_partial_page(struct address_space *mapping,
> +               struct page *page, loff_t start, loff_t end);
>  #endif /* __MM_INTERNAL_H */
> --- a/mm/shmem.c~mm-truncateshmem-handle-truncates-that-split-thps
> +++ a/mm/shmem.c
> @@ -858,32 +858,6 @@ void shmem_unlock_mapping(struct address
>  }
>
>  /*
> - * Check whether a hole-punch or truncation needs to split a huge page,
> - * returning true if no split was required, or the split has been successful.
> - *
> - * Eviction (or truncation to 0 size) should never need to split a huge page;
> - * but in rare cases might do so, if shmem_undo_range() failed to trylock on
> - * head, and then succeeded to trylock on tail.
> - *
> - * A split can only succeed when there are no additional references on the
> - * huge page: so the split below relies upon find_get_entries() having stopped
> - * when it found a subpage of the huge page, without getting further references.
> - */
> -static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
> -{
> -       if (!PageTransCompound(page))
> -               return true;
> -
> -       /* Just proceed to delete a huge page wholly within the range punched */
> -       if (PageHead(page) &&
> -           page->index >= start && page->index + HPAGE_PMD_NR <= end)
> -               return true;
> -
> -       /* Try to split huge page, so we can truly punch the hole or truncate */
> -       return split_huge_page(page) >= 0;
> -}
> -
> -/*
>   * Remove range of pages and swap entries from page cache, and free them.
>   * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
>   */
> @@ -892,26 +866,33 @@ static void shmem_undo_range(struct inod
>  {
>         struct address_space *mapping = inode->i_mapping;
>         struct shmem_inode_info *info = SHMEM_I(inode);
> -       pgoff_t start = (lstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -       pgoff_t end = (lend + 1) >> PAGE_SHIFT;
> -       unsigned int partial_start = lstart & (PAGE_SIZE - 1);
> -       unsigned int partial_end = (lend + 1) & (PAGE_SIZE - 1);
> +       pgoff_t start, end;
>         struct pagevec pvec;
>         pgoff_t indices[PAGEVEC_SIZE];
> +       struct page *page;
>         long nr_swaps_freed = 0;
>         pgoff_t index;
>         int i;
>
> -       if (lend == -1)
> -               end = -1;       /* unsigned, so actually very big */
> +       page = NULL;
> +       start = lstart >> PAGE_SHIFT;
> +       shmem_getpage(inode, start, &page, SGP_READ);
> +       if (page) {
> +               page = thp_head(page);
> +               set_page_dirty(page);
> +               start = truncate_inode_partial_page(mapping, page, lstart,
> +                                                       lend);
> +       }
> +
> +       /* 'end' includes a partial page */
> +       end = lend / PAGE_SIZE;
>
>         pagevec_init(&pvec);
>         index = start;
>         while (index < end && find_lock_entries(mapping, index, end - 1,
> -                       &pvec, indices)) {
> +                                                       &pvec, indices)) {
>                 for (i = 0; i < pagevec_count(&pvec); i++) {
> -                       struct page *page = pvec.pages[i];
> -
> +                       page = pvec.pages[i];
>                         index = indices[i];
>
>                         if (xa_is_value(page)) {
> @@ -921,8 +902,6 @@ static void shmem_undo_range(struct inod
>                                                                 index, page);
>                                 continue;
>                         }
> -                       index += thp_nr_pages(page) - 1;
> -
>                         if (!unfalloc || !PageUptodate(page))
>                                 truncate_inode_page(mapping, page);
>                         unlock_page(page);
> @@ -933,90 +912,60 @@ static void shmem_undo_range(struct inod
>                 index++;
>         }
>
> -       if (partial_start) {
> -               struct page *page = NULL;
> -               shmem_getpage(inode, start - 1, &page, SGP_READ);
> -               if (page) {
> -                       unsigned int top = PAGE_SIZE;
> -                       if (start > end) {
> -                               top = partial_end;
> -                               partial_end = 0;
> -                       }
> -                       zero_user_segment(page, partial_start, top);
> -                       set_page_dirty(page);
> -                       unlock_page(page);
> -                       put_page(page);
> -               }
> -       }
> -       if (partial_end) {
> -               struct page *page = NULL;
> -               shmem_getpage(inode, end, &page, SGP_READ);
> -               if (page) {
> -                       zero_user_segment(page, 0, partial_end);
> -                       set_page_dirty(page);
> -                       unlock_page(page);
> -                       put_page(page);
> -               }
> -       }
> -       if (start >= end)
> -               return;
> -
>         index = start;
> -       while (index < end) {
> +       while (index <= end) {
>                 cond_resched();
>
> -               if (!find_get_entries(mapping, index, end - 1, &pvec,
> -                               indices)) {
> +               if (!find_get_entries(mapping, index, end, &pvec, indices)) {
>                         /* If all gone or hole-punch or unfalloc, we're done */
> -                       if (index == start || end != -1)
> +                       if (index == start || lend != (loff_t)-1)
>                                 break;
>                         /* But if truncating, restart to make sure all gone */
>                         index = start;
>                         continue;
>                 }
> +
>                 for (i = 0; i < pagevec_count(&pvec); i++) {
> -                       struct page *page = pvec.pages[i];
> +                       page = pvec.pages[i];
>
> -                       index = indices[i];
>                         if (xa_is_value(page)) {
>                                 if (unfalloc)
>                                         continue;
> -                               if (shmem_free_swap(mapping, index, page)) {
> -                                       /* Swap was replaced by page: retry */
> -                                       index--;
> -                                       break;
> +                               index = indices[i];
> +                               if (index == end) {
> +                                       /* Partial page swapped out? */
> +                                       shmem_getpage(inode, end, &page,
> +                                                               SGP_READ);
> +                               } else {
> +                                       if (shmem_free_swap(mapping, index,
> +                                                               page)) {
> +                                               /* Swap replaced: retry */
> +                                               break;
> +                                       }
> +                                       nr_swaps_freed++;
> +                                       continue;
>                                 }
> -                               nr_swaps_freed++;
> -                               continue;
> +                       } else {
> +                               lock_page(page);
>                         }
>
> -                       lock_page(page);
> -
>                         if (!unfalloc || !PageUptodate(page)) {
>                                 if (page_mapping(page) != mapping) {
>                                         /* Page was replaced by swap: retry */
>                                         unlock_page(page);
> -                                       index--;
> +                                       put_page(page);
>                                         break;
>                                 }
>                                 VM_BUG_ON_PAGE(PageWriteback(page), page);
> -                               if (shmem_punch_compound(page, start, end))
> -                                       truncate_inode_page(mapping, page);
> -                               else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> -                                       /* Wipe the page and don't get stuck */
> -                                       clear_highpage(page);
> -                                       flush_dcache_page(page);
> -                                       set_page_dirty(page);
> -                                       if (index <
> -                                           round_up(start, HPAGE_PMD_NR))
> -                                               start = index + 1;
> -                               }
> +                               index = truncate_inode_partial_page(mapping,
> +                                               page, lstart, lend);
> +                               if (index > end)
> +                                       end = indices[i] - 1;
>                         }
> -                       unlock_page(page);

The fix needed is here: instead of deleting that unlock_page(page)
line, it needs to be } else { unlock_page(page); }

>                 }
> +               index = indices[i - 1] + 1;
>                 pagevec_remove_exceptionals(&pvec);
> -               pagevec_release(&pvec);
> -               index++;
> +               pagevec_reinit(&pvec);
>         }
>
>         spin_lock_irq(&info->lock);
> --- a/mm/truncate.c~mm-truncateshmem-handle-truncates-that-split-thps
> +++ a/mm/truncate.c
> @@ -225,6 +225,63 @@ int truncate_inode_page(struct address_s
>  }
>
>  /*
> + * Handle partial (transparent) pages.  If the page is entirely within
> + * the range, we discard it.  If not, we split the page if it's a THP
> + * and zero the part of the page that's within the [start, end] range.
> + * split_page_range() will discard any of the subpages which now lie
> + * beyond i_size, and the caller will discard pages which lie within a
> + * newly created hole.
> + *
> + * Return: The index after the current page.
> + */
> +pgoff_t truncate_inode_partial_page(struct address_space *mapping,
> +               struct page *page, loff_t start, loff_t end)
> +{
> +       loff_t pos = page_offset(page);
> +       pgoff_t next_index = page->index + thp_nr_pages(page);
> +       unsigned int offset, length;
> +
> +       if (pos < start)
> +               offset = start - pos;
> +       else
> +               offset = 0;
> +       length = thp_size(page);
> +       if (pos + length <= (u64)end)
> +               length = length - offset;
> +       else
> +               length = end + 1 - pos - offset;
> +
> +       wait_on_page_writeback(page);
> +       if (length == thp_size(page))
> +               goto truncate;
> +
> +       cleancache_invalidate_page(page->mapping, page);
> +       if (page_has_private(page))
> +               do_invalidatepage(page, offset, length);
> +       if (!PageTransHuge(page))
> +               goto zero;
> +       page += offset / PAGE_SIZE;
> +       if (split_huge_page(page) < 0) {
> +               page -= offset / PAGE_SIZE;
> +               goto zero;
> +       }
> +       next_index = page->index + 1;
> +       offset %= PAGE_SIZE;
> +       if (offset == 0 && length >= PAGE_SIZE)
> +               goto truncate;
> +       length = PAGE_SIZE - offset;
> +zero:
> +       zero_user(page, offset, length);
> +       goto out;
> +truncate:
> +       truncate_inode_page(mapping, page);
> +out:
> +       unlock_page(page);
> +       put_page(page);
> +       return next_index;
> +}
> +
> +/*
>   * Used to get rid of pages on hardware memory corruption.
>   */
>  int generic_error_remove_page(struct address_space *mapping, struct page *page)
> @@ -275,10 +332,6 @@ int invalidate_inode_page(struct page *p
>   * The first pass will remove most pages, so the search cost of the second pass
>   * is low.
>   *
> - * We pass down the cache-hot hint to the page freeing code.  Even if the
> - * mapping is large, it is probably the case that the final pages are the most
> - * recently touched, and freeing happens in ascending file offset order.
> - *
>   * Note that since ->invalidatepage() accepts range to invalidate
>   * truncate_inode_pages_range is able to handle cases where lend + 1 is not
>   * page aligned properly.
> @@ -286,38 +339,24 @@ int invalidate_inode_page(struct page *p
>  void truncate_inode_pages_range(struct address_space *mapping,
>                                 loff_t lstart, loff_t lend)
>  {
> -       pgoff_t         start;          /* inclusive */
> -       pgoff_t         end;            /* exclusive */
> -       unsigned int    partial_start;  /* inclusive */
> -       unsigned int    partial_end;    /* exclusive */
> +       pgoff_t start, end;
>         struct pagevec  pvec;
>         pgoff_t         indices[PAGEVEC_SIZE];
>         pgoff_t         index;
>         int             i;
> +       struct page *   page;
>
>         if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
>                 goto out;
>
> -       /* Offsets within partial pages */
> -       partial_start = lstart & (PAGE_SIZE - 1);
> -       partial_end = (lend + 1) & (PAGE_SIZE - 1);
> +       start = lstart >> PAGE_SHIFT;
> +       page = find_lock_head(mapping, start);
> +       if (page)
> +               start = truncate_inode_partial_page(mapping, page, lstart,
> +                                                       lend);
>
> -       /*
> -        * 'start' and 'end' always covers the range of pages to be fully
> -        * truncated. Partial pages are covered with 'partial_start' at the
> -        * start of the range and 'partial_end' at the end of the range.
> -        * Note that 'end' is exclusive while 'lend' is inclusive.
> -        */
> -       start = (lstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
> -       if (lend == -1)
> -               /*
> -                * lend == -1 indicates end-of-file so we have to set 'end'
> -                * to the highest possible pgoff_t and since the type is
> -                * unsigned we're using -1.
> -                */
> -               end = -1;
> -       else
> -               end = (lend + 1) >> PAGE_SHIFT;
> +       /* 'end' includes a partial page */
> +       end = lend / PAGE_SIZE;
>
>         pagevec_init(&pvec);
>         index = start;
> @@ -334,50 +373,11 @@ void truncate_inode_pages_range(struct a
>                 cond_resched();
>         }
>
> -       if (partial_start) {
> -               struct page *page = find_lock_page(mapping, start - 1);
> -               if (page) {
> -                       unsigned int top = PAGE_SIZE;
> -                       if (start > end) {
> -                               /* Truncation within a single page */
> -                               top = partial_end;
> -                               partial_end = 0;
> -                       }
> -                       wait_on_page_writeback(page);
> -                       zero_user_segment(page, partial_start, top);
> -                       cleancache_invalidate_page(mapping, page);
> -                       if (page_has_private(page))
> -                               do_invalidatepage(page, partial_start,
> -                                                 top - partial_start);
> -                       unlock_page(page);
> -                       put_page(page);
> -               }
> -       }
> -       if (partial_end) {
> -               struct page *page = find_lock_page(mapping, end);
> -               if (page) {
> -                       wait_on_page_writeback(page);
> -                       zero_user_segment(page, 0, partial_end);
> -                       cleancache_invalidate_page(mapping, page);
> -                       if (page_has_private(page))
> -                               do_invalidatepage(page, 0,
> -                                                 partial_end);
> -                       unlock_page(page);
> -                       put_page(page);
> -               }
> -       }
> -       /*
> -        * If the truncation happened within a single page no pages
> -        * will be released, just zeroed, so we can bail out now.
> -        */
> -       if (start >= end)
> -               goto out;
> -
>         index = start;
> -       for ( ; ; ) {
> +       while (index <= end) {
>                 cond_resched();
> -               if (!find_get_entries(mapping, index, end - 1, &pvec,
> -                               indices)) {
> +
> +               if (!find_get_entries(mapping, index, end, &pvec, indices)) {
>                         /* If all gone from start onwards, we're done */
>                         if (index == start)
>                                 break;
> @@ -387,23 +387,20 @@ void truncate_inode_pages_range(struct a
>                 }
>
>                 for (i = 0; i < pagevec_count(&pvec); i++) {
> -                       struct page *page = pvec.pages[i];
> -
> -                       /* We rely upon deletion not changing page->index */
> -                       index = indices[i];
> -
> +                       page = pvec.pages[i];
>                         if (xa_is_value(page))
>                                 continue;
>
>                         lock_page(page);
> -                       WARN_ON(page_to_index(page) != index);
> -                       wait_on_page_writeback(page);
> -                       truncate_inode_page(mapping, page);
> -                       unlock_page(page);
> +                       index = truncate_inode_partial_page(mapping, page,
> +                                                       lstart, lend);
> +                       /* Couldn't split a THP? */
> +                       if (index > end)
> +                               end = indices[i] - 1;
>                 }
> +               index = indices[i - 1] + 1;
>                 truncate_exceptional_pvec_entries(mapping, &pvec, indices);
> -               pagevec_release(&pvec);
> -               index++;
> +               pagevec_reinit(&pvec);
>         }
>
>  out:
> _
>



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux