Re: [PATCH v3] vfs: fix page locking deadlocks when deduping files

Filipe Manana <fdmanana@xxxxxxxxx> · Tue, 13 Aug 2019 16:53:09 +0100

On Tue, Aug 13, 2019 at 4:15 PM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
>
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
>
> When dedupe wants to use the page cache to compare parts of two files
> for dedupe, we must be very careful to handle locking correctly.  The
> current code doesn't do this.  It must lock and unlock the page only
> once if the two pages are the same, since the overlapping range check
> doesn't catch this when blocksize < pagesize.  If the pages are distinct
> but from the same file, we must observe page locking order and lock them
> in order of increasing offset to avoid clashing with writeback locking.
>
> Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> Reviewed-by: Bill O'Donnell <billodo@xxxxxxxxxx>

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>

We actually had the same bug in btrfs, before we had cloning/dedupe in
vfs/xfs/etc, and fixed it back in 2017 [1].
I totally missed this behaviour in the vfs helpers when I updated
btrfs to use them some months ago.
Thanks.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1517622f2524f531113b12c27b9a0ea69c38983

> ---
> v3: revalidate page after locking it
> v2: provide an unlock helper
> ---
>  fs/read_write.c |   50 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 1f5088dec566..da341eb3033c 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
>         return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
>  }
>
> -/*
> - * Read a page's worth of file data into the page cache.  Return the page
> - * locked.
> - */
> +/* Read a page's worth of file data into the page cache. */
>  static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>  {
>         struct page *page;
> @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>                 put_page(page);
>                 return ERR_PTR(-EIO);
>         }
> -       lock_page(page);
>         return page;
>  }
>
> +/*
> + * Lock two pages, ensuring that we lock in offset order if the pages are from
> + * the same file.
> + */
> +static void vfs_lock_two_pages(struct page *page1, struct page *page2)
> +{
> +       /* Always lock in order of increasing index. */
> +       if (page1->index > page2->index)
> +               swap(page1, page2);
> +
> +       lock_page(page1);
> +       if (page1 != page2)
> +               lock_page(page2);
> +}
> +
> +/* Unlock two pages, being careful not to unlock the same page twice. */
> +static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
> +{
> +       unlock_page(page1);
> +       if (page1 != page2)
> +               unlock_page(page2);
> +}
> +
>  /*
>   * Compare extents of two files to see if they are the same.
>   * Caller must have locked both inodes to prevent write races.
> @@ -1867,10 +1886,25 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>                 dest_page = vfs_dedupe_get_page(dest, destoff);
>                 if (IS_ERR(dest_page)) {
>                         error = PTR_ERR(dest_page);
> -                       unlock_page(src_page);
>                         put_page(src_page);
>                         goto out_error;
>                 }
> +
> +               vfs_lock_two_pages(src_page, dest_page);
> +
> +               /*
> +                * Now that we've locked both pages, make sure they still
> +                * represent the data we're interested in.  If not, someone
> +                * is invalidating pages on us and we lose.
> +                */
> +               if (src_page->mapping != src->i_mapping ||
> +                   src_page->index != srcoff >> PAGE_SHIFT ||
> +                   dest_page->mapping != dest->i_mapping ||
> +                   dest_page->index != destoff >> PAGE_SHIFT) {
> +                       same = false;
> +                       goto unlock;
> +               }
> +
>                 src_addr = kmap_atomic(src_page);
>                 dest_addr = kmap_atomic(dest_page);
>
> @@ -1882,8 +1916,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>
>                 kunmap_atomic(dest_addr);
>                 kunmap_atomic(src_addr);
> -               unlock_page(dest_page);
> -               unlock_page(src_page);
> +unlock:
> +               vfs_unlock_two_pages(src_page, dest_page);
>                 put_page(dest_page);
>                 put_page(src_page);
>

-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”