On 2019/2/7 6:11 上午, Nix wrote: > So I just upgraded to 4.20 and revived my long-turned-off bcache now > that the metadata corruption leading to mount failure on dirty close may > have been identified (applying Tang Junhui's patch to do so)... and I > spotted something a bit disturbing. It appears that XFS directory and > metadata I/O is going more or less entirely uncached. > > Here's some bcache stats before and after a git status of a *huge* > uncached tree (Chromium) on my no-writeback readaround cache. It takes > many minutes and pounds the disk with massively seeky metadata I/O in > the process: > > Before: > > stats_total/bypassed: 48.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 861045 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16286 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411575 > stats_total/cache_readaheads: 0 > > After: > stats_total/bypassed: 49.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 1154887 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16291 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411625 > stats_total/cache_readaheads: 0 > > Huge increase in bypassed reads, essentially no new cached reads. This > is... basically the optimum case for bcache, and it's not caching it! > > From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially > all directory reads in XFS appear to bcache as a single non-readahead > followed by a pile of readahead I/O: bcache bypasses readahead bios, so > all directory reads (or perhaps all directory reads larger than a single > block) are going to be bypassed out of hand. > > This seems... suboptimal, but so does filling up the cache with > read-ahead blocks (particularly for non-metadata) that are never used. > Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear > to let us distinguish between "read-ahead just in case but almost > certain to be accessed" (like directory blocks) and "read ahead on the > offchance because someone did a single-block file read and what the hell > let's suck in a bunch more". > > As it is, this seems to render bcache more or less useless with XFS, > since bcache's primary raison d'etre is precisely to cache seeky stuff > like metadata. :( > Hi Nix, Could you please to try whether the attached patch makes things better ? Thanks in advance for your help. -- Coly Li
From 7c27e26017f6297a6bc6a8075732f69d3edcc52e Mon Sep 17 00:00:00 2001 From: Coly Li <colyli@xxxxxxx> Date: Thu, 7 Feb 2019 15:54:24 +0800 Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio. This assumption is not always correct, e.g. XFS uses REQ_META to mark metadata bio other than REQ_PRIO. This is why Nix reports a regression that bcache does not cache metadata for XFS after the above commit. Thanks to Dave Chinner, he explains the difference between REQ_META and REQ_PRIO from view of file system developer. Here I quote part of his explanation from mailing list, REQ_META is used for metadata. REQ_PRIO is used to communicate to the lower layers that the submitter considers this IO to be more important that non REQ_PRIO IO and so dispatch should be expedited. IOWs, if the filesystem considers metadata IO to be more important that user data IO, then it will use REQ_PRIO | REQ_META rather than just REQ_META. Then it seems bios with REQ_META or REQ_PRIO should both be cached for performance optimation, because they are all probably low I/O latency demand by upper layer (e.g. file system). So in this patch, when we want to check whether a bio is metadata related, REQ_META and REQ_PRIO are both checked. Then both metadata and high priority I/O requests will be handled properly. Reported-by: Nix <nix@xxxxxxxxxxxxx> Signed-off-by: Coly Li <colyli@xxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Cc: Andre Noll <maan@xxxxxxxxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> --- drivers/md/bcache/request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 3bf35914bb57..62bda90a38dc 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -395,7 +395,7 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) * unless the read-ahead request is for metadata (eg, for gfs2 or xfs). */ if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && - !(bio->bi_opf & REQ_PRIO)) + !(bio->bi_opf & (REQ_META|REQ_PRIO))) goto skip; if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || @@ -877,7 +877,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s, } if (!(bio->bi_opf & REQ_RAHEAD) && - !(bio->bi_opf & REQ_PRIO) && + !(bio->bi_opf & (REQ_META|REQ_PRIO)) && s->iop.c->gc_stats.in_use < CUTOFF_CACHE_READA) reada = min_t(sector_t, dc->readahead >> 9, get_capacity(bio->bi_disk) - bio_end_sector(bio)); -- 2.16.4