On Tue, Aug 13, 2019 at 4:15 PM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > When dedupe wants to use the page cache to compare parts of two files > for dedupe, we must be very careful to handle locking correctly. The > current code doesn't do this. It must lock and unlock the page only > once if the two pages are the same, since the overlapping range check > doesn't catch this when blocksize < pagesize. If the pages are distinct > but from the same file, we must observe page locking order and lock them > in order of increasing offset to avoid clashing with writeback locking. > > Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions") > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > Reviewed-by: Bill O'Donnell <billodo@xxxxxxxxxx> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> We actually had the same bug in btrfs, before we had cloning/dedupe in vfs/xfs/etc, and fixed it back in 2017 [1]. I totally missed this behaviour in the vfs helpers when I updated btrfs to use them some months ago. Thanks. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1517622f2524f531113b12c27b9a0ea69c38983 > --- > v3: revalidate page after locking it > v2: provide an unlock helper > --- > fs/read_write.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 42 insertions(+), 8 deletions(-) > > diff --git a/fs/read_write.c b/fs/read_write.c > index 1f5088dec566..da341eb3033c 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in, > return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL; > } > > -/* > - * Read a page's worth of file data into the page cache. Return the page > - * locked. > - */ > +/* Read a page's worth of file data into the page cache. */ > static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) > { > struct page *page; > @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset) > put_page(page); > return ERR_PTR(-EIO); > } > - lock_page(page); > return page; > } > > +/* > + * Lock two pages, ensuring that we lock in offset order if the pages are from > + * the same file. > + */ > +static void vfs_lock_two_pages(struct page *page1, struct page *page2) > +{ > + /* Always lock in order of increasing index. */ > + if (page1->index > page2->index) > + swap(page1, page2); > + > + lock_page(page1); > + if (page1 != page2) > + lock_page(page2); > +} > + > +/* Unlock two pages, being careful not to unlock the same page twice. */ > +static void vfs_unlock_two_pages(struct page *page1, struct page *page2) > +{ > + unlock_page(page1); > + if (page1 != page2) > + unlock_page(page2); > +} > + > /* > * Compare extents of two files to see if they are the same. > * Caller must have locked both inodes to prevent write races. > @@ -1867,10 +1886,25 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, > dest_page = vfs_dedupe_get_page(dest, destoff); > if (IS_ERR(dest_page)) { > error = PTR_ERR(dest_page); > - unlock_page(src_page); > put_page(src_page); > goto out_error; > } > + > + vfs_lock_two_pages(src_page, dest_page); > + > + /* > + * Now that we've locked both pages, make sure they still > + * represent the data we're interested in. If not, someone > + * is invalidating pages on us and we lose. > + */ > + if (src_page->mapping != src->i_mapping || > + src_page->index != srcoff >> PAGE_SHIFT || > + dest_page->mapping != dest->i_mapping || > + dest_page->index != destoff >> PAGE_SHIFT) { > + same = false; > + goto unlock; > + } > + > src_addr = kmap_atomic(src_page); > dest_addr = kmap_atomic(dest_page); > > @@ -1882,8 +1916,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, > > kunmap_atomic(dest_addr); > kunmap_atomic(src_addr); > - unlock_page(dest_page); > - unlock_page(src_page); > +unlock: > + vfs_unlock_two_pages(src_page, dest_page); > put_page(dest_page); > put_page(src_page); > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”