On 09/09/2016 04:17 AM, Jan Kara wrote:
On Mon 22-08-16 13:35:01, Josef Bacik wrote:
Provide a mechanism for file systems to indicate how much dirty metadata they
are holding. This introduces a few things
1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
2) WB stat for dirty metadata. This way we know if we need to try and call into
the file system to write out metadata. This could potentially be used in the
future to make balancing of dirty pages smarter.
So I'm curious about one thing: In the previous posting you have mentioned
that the main motivation for this work is to have a simple support for
sub-pagesize dirty metadata blocks that need tracking in btrfs. However you
do the dirty accounting at page granularity. What are your plans to handle
this mismatch?
We already track how much dirty metadata we have internally in btrfs, I
envisioned the subpage blocksize guys just calling the accounting ever N objects
that were dirited in order to keep the accounting correct. This is not great,
but it was better than the hoops we needed to jump through to deal with the
btree_inode and subpagesize blocksizes.
The thing is you actually shouldn't miscount by too much as that could
upset some checks in mm checking how much dirty pages a node has directing
how reclaim should be done... But it's a question whether NR_METADATA_DIRTY
should be actually used in the checks in node_limits_ok() or in
node_pagecache_reclaimable() at all because once you start accounting dirty
slab objects, you are really on a thin ice...
Agreed, this does get a bit ugly.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda..d329f89 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1809,6 +1809,7 @@ static unsigned long get_nr_dirty_pages(void)
{
return global_node_page_state(NR_FILE_DIRTY) +
global_node_page_state(NR_UNSTABLE_NFS) +
+ global_node_page_state(NR_METADATA_DIRTY) +
get_nr_dirty_inodes();
With my question is also connected this - when we have NR_METADATA_DIRTY,
we could just account dirty inodes there and get rid of this
get_nr_dirty_inodes() hack...
But actually getting this to work right to be able to track dirty inodes would
be useful on its own - some throlling of creation of dirty inodes would be
useful for several filesystems (ext4, xfs, ...).
So I suppose what I could do is instead provide a callback for the vm to ask how
many dirty objects we have in the file system, instead of adding another page
counter. That way the actual accounting is kept internal to the file system,
and it gets rid of the weird mismatch when blocksize < pagesize. Does that
sound like a more acceptable approach? Unfortunately I decided to do this work
to make the blocksize < pagesize work easier, but then didn't actually think
about how the accounting would interact with that case, because I'm an idiot.
I think that looping through all the sb's in the system would be kinda shitty
for this tho, we want the "get number of dirty pages" part to be relatively
fast. What if I do something like the shrinker_control only for dirty objects.
So the fs registers some dirty_objects_control, we call into each of those and
get the counts from that. Does that sound less crappy? Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html