On 2017/9/20 下午6:07, Kent Overstreet wrote: > On Wed, Sep 20, 2017 at 06:24:33AM +0800, Coly Li wrote: >> When bcache does read I/Os, for example in writeback or writethrough mode, >> if a read request on cache device is failed, bcache will try to recovery >> the request by reading from cached device. If the data on cached device is >> not synced with cache device, then requester will get a stale data. >> >> For critical storage system like database, providing stale data from >> recovery may result an application level data corruption, which is >> unacceptible. But for some other situation like multi-media stream cache, >> continuous service may be more important and it is acceptible to fetch >> a chunk of stale data. >> >> This patch tries to solve the above conflict by adding a sysfs option >> /sys/block/bcache<idx>/bcache/allow_stale_data_on_failure >> which is defaultly cleared (to 0) as disabled. Now people can make choices >> for different situations. > > IMO this is just a bug, I'd rather not have an option to keep the buggy > behaviour. How about this patch: > Hi Kent, OK, last time when I discuss with other bcache developers, people wanted to keep this behavior, then I modify it as an option in this version patch. I support fix it without an option, because there are too many options already. Good to know you have similar decision :-) > commit 2746f9c1f962288d8c5d7dabe698bf7b3fddd405 > Author: Kent Overstreet <kent.overstreet@xxxxxxxxx> > Date: Wed Sep 20 18:06:37 2017 +0200 > > bcache: Don't recover from IO errors when reading dirty data > > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx> > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c > index 382397772a..c2d57ef953 100644 > --- a/drivers/md/bcache/request.c > +++ b/drivers/md/bcache/request.c > @@ -532,8 +532,10 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k) > > PTR_BUCKET(b->c, k, ptr)->prio = INITIAL_PRIO; > > - if (KEY_DIRTY(k)) > + if (KEY_DIRTY(k)) { > s->read_dirty_data = true; > + s->recoverable = false; > + } > I though of fixing here, the reason I gave up to modify here was, cache_lookup_fn() is called for keys in leaf nodes (b->level == 0), bch_btree_map_keys_recurse() needs to do I/O to fetch upper level nodes before accessing leaf node. When a SSD failed bch_btree_node_get() will fail before cache_lookup_fn() is executed. So the your patch, there is no chance to set s->recoverable to false, recovery still happens. If you don't like an option, the following modification should be much simpler, diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 681b4f12b05a..f397785d9c38 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -697,8 +697,10 @@ static void cached_dev_read_error(struct closure *cl) { struct search *s = container_of(cl, struct search, cl); struct bio *bio = &s->bio.bio; + struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); - if (s->recoverable) { + if (s->recoverable && + (dc && !atomic_read(&dc->has_dirty)) { /* Retry from the backing device: */ trace_bcache_read_retry(s->orig_bio); This might be the simplest way I know for now. Thanks. Coly Li