On Mon, Jul 22, 2019 at 04:13:37PM -0400, Johannes Weiner wrote: > psi tracks the time tasks wait for refaulting pages to become > uptodate, but it does not track the time spent submitting the IO. The > submission part can be significant if backing storage is contended or > when cgroup throttling (io.latency) is in effect - a lot of time is > spent in submit_bio(). In that case, we underreport memory pressure. > > Annotate the submit_bio() paths (or the indirection through readpage) > for refaults and swapin to get proper psi coverage of delays there. > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > --- > fs/btrfs/extent_io.c | 14 ++++++++++++-- > fs/ext4/readpage.c | 9 +++++++++ > fs/f2fs/data.c | 8 ++++++++ > fs/mpage.c | 9 +++++++++ > mm/filemap.c | 20 ++++++++++++++++++++ > mm/page_io.c | 11 ++++++++--- > mm/readahead.c | 24 +++++++++++++++++++++++- > 7 files changed, 89 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 1eb671c16ff1..2d2b3239965a 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -13,6 +13,7 @@ > #include <linux/pagevec.h> > #include <linux/prefetch.h> > #include <linux/cleancache.h> > +#include <linux/psi.h> > #include "extent_io.h" > #include "extent_map.h" > #include "ctree.h" > @@ -4267,6 +4268,9 @@ int extent_readpages(struct address_space *mapping, struct list_head *pages, > struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree; > int nr = 0; > u64 prev_em_start = (u64)-1; > + int ret = 0; > + bool refault = false; > + unsigned long pflags; > > while (!list_empty(pages)) { > u64 contig_end = 0; > @@ -4281,6 +4285,10 @@ int extent_readpages(struct address_space *mapping, struct list_head *pages, > put_page(page); > break; > } > + if (PageWorkingset(page) && !refault) { > + psi_memstall_enter(&pflags); > + refault = true; > + } > > pagepool[nr++] = page; > contig_end = page_offset(page) + PAGE_SIZE - 1; > @@ -4301,8 +4309,10 @@ int extent_readpages(struct address_space *mapping, struct list_head *pages, > free_extent_map(em_cached); > > if (bio) > - return submit_one_bio(bio, 0, bio_flags); > - return 0; > + ret = submit_one_bio(bio, 0, bio_flags); > + if (refault) > + psi_memstall_leave(&pflags); > + return ret; This all seems extremely fragile to me. Sprinkling magic, undocumented pixie dust through the IO paths to account for something nobody can actually determine is working correctly is a bad idea. People are going to break this without knowing it, nobody is going to notice because there are no regression tests for it, and this will all end up in frustration for users because it constantly gets broken and doesn't work reliably. e.g. If this is needed around all calls to ->readpage(), then please write a readpage wrapper function and convert all the callers to use that wrapper. Even better: If this memstall and "refault" check is needed to account for bio submission blocking, then page cache iteration is the wrong place to be doing this check. It should be done entirely in the bio code when adding pages to the bio because we'll only ever be doing page cache read IO on page cache misses. i.e. this isn't dependent on adding a new page to the LRU or not - if we add a new page then we are going to be doing IO and so this does not require magic pixie dust at the page cache iteration level e.g. bio_add_page_memstall() can do the working set check and then set a flag on the bio to say it contains a memstall page. Then on submission of the bio the memstall condition can be cleared. Cheers, -Dave. -- Dave Chinner david@xxxxxxxxxxxxx