[PATCH] btrfs: save i_size in compress_file_range

Josef Bacik <josef@xxxxxxxxxxxxxx> · Fri, 11 Oct 2019 09:03:54 -0400

We hit a regression while rolling out 5.2 internally where we were
hitting the following panic

kernel BUG at mm/page-writeback.c:2659!
RIP: 0010:clear_page_dirty_for_io+0xe6/0x1f0
Call Trace:
 __process_pages_contig+0x25a/0x350
 ? extent_clear_unlock_delalloc+0x43/0x70
 submit_compressed_extents+0x359/0x4d0
 normal_work_helper+0x15a/0x330
 process_one_work+0x1f5/0x3f0
 worker_thread+0x2d/0x3d0
 ? rescuer_thread+0x340/0x340
 kthread+0x111/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x1f/0x30

this is happening because the page is not locked when doing
clear_page_dirty_for_io.  Looking at the core dump it was because our
async_extent had a ram_size of 24576 but our async_chunk range only
spanned 20480, so we had a whole extra page in our ram_size for our
async_extent.

This happened because we try not to compress pages outside of our
i_size, however a cleanup patch changed us to do

actual_end = min_t(u64, i_size_read(inode), end + 1);

which is problematic because i_size_read() can evaluate to different
values in between checking and assigning.  So either a expanding
truncate or a fallocate could increase our i_size while we're doing
writeout and actual_end would end up being past the range we have
locked.

I confirmed this was what was happening by installing a debug kernel
that had

actual_end = min_t(u64, i_size_read(inode), end + 1);
if (actual_end > end + 1) {
	printk(KERN_ERR "WE GOT FUCKED\n");
	actual_end = end + 1;
}

and installing it onto 500 boxes of the tier that had been seeing the
problem regularly.  Last night I got my debug message and no panic,
confirming what I expected.

Fixes: 62b37622718c ("btrfs: Remove isize local variable in compress_file_range")
cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
---
 fs/btrfs/inode.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2eb1d7249f83..9a483d1f61f8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -474,6 +474,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	u64 start = async_chunk->start;
 	u64 end = async_chunk->end;
 	u64 actual_end;
+	loff_t i_size = i_size_read(inode);
 	int ret = 0;
 	struct page **pages = NULL;
 	unsigned long nr_pages;
@@ -488,7 +489,13 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	inode_should_defrag(BTRFS_I(inode), start, end, end - start + 1,
 			SZ_16K);
 
-	actual_end = min_t(u64, i_size_read(inode), end + 1);
+	/*
+	 * We need to save i_size before now because it could change in between
+	 * us evaluating the size and assigning it.  This is because we lock and
+	 * unlock the page in truncate and fallocate, and then modify the i_size
+	 * later on.
+	 */
+	actual_end = min_t(u64, i_size, end + 1);
 again:
 	will_compress = 0;
 	nr_pages = (end >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
-- 
2.21.0