[PATCH 45/45] btrfs: fix race on syncing the btree inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When doing sync, the btree dirty pages refuse to go away for tens of seconds:

# vmmon -d 1 nr_writeback nr_dirty nr_unstable

     nr_writeback         nr_dirty      nr_unstable
            46641            23315                0
            46641            23380                0
            46641            23381                0
            26674            43206                0
            18963            51006                0
            11252            58721                0
             3528            66419                0
                0            70024                0
                0            70024                0
                0            70024                0
                0            70024                0
                0            70024                0
                0            70024                0
                0            70024                0
                0            70024                0

Note that the 70024 pages are under the btree inode's 32MB
no-write-metadata threshold. This is racy because the sync
work has to sleep and retry it forever for data integrity.

The 32MB threshold may also become a problem for background
writeback given a memory tight box. So it may be better to
replace the threshold with some informed writeback tricks.

CC: Chris Mason <chris.mason@xxxxxxxxxx> 
Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
 fs/btrfs/disk-io.c |   29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

--- linux.orig/fs/btrfs/disk-io.c	2009-10-07 14:31:45.000000000 +0800
+++ linux/fs/btrfs/disk-io.c	2009-10-07 14:32:55.000000000 +0800
@@ -707,22 +707,19 @@ static int btree_writepage(struct page *
 static int btree_writepages(struct address_space *mapping,
 			    struct writeback_control *wbc)
 {
-	struct extent_io_tree *tree;
-	tree = &BTRFS_I(mapping->host)->io_tree;
-	if (wbc->sync_mode == WB_SYNC_NONE) {
-		struct btrfs_root *root = BTRFS_I(mapping->host)->root;
-		u64 num_dirty;
-		unsigned long thresh = 32 * 1024 * 1024;
-
-		if (wbc->for_kupdate)
-			return 0;
-
-		/* this is a bit racy, but that's ok */
-		num_dirty = root->fs_info->dirty_metadata_bytes;
-		if (num_dirty < thresh)
-			return 0;
-	}
-	return extent_writepages(tree, mapping, btree_get_extent, wbc);
+	struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree;
+	int ret;
+
+	if (!wbc->for_sync)
+		wbc->nr_segments = 1;
+	ret = extent_writepages(tree, mapping, btree_get_extent, wbc);
+	/*
+	 * Fake some some skipped pages, so that VFS won't
+	 * try hard on writing this inode.
+	 */
+	if (!wbc->for_sync)
+		wbc->pages_skipped++;
+	return ret;
 }
 
 static int btree_readpage(struct file *file, struct page *page)


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux