On Tue, Dec 18, 2018 at 02:37:14PM +0800, Junhui Tang wrote: > From 8ca52771f1d3b83ff55dc80ba633901a081963bf Mon Sep 17 00:00:00 2001 > From: Tang Junhui <tang.junhui.linux@xxxxxxxxx> > Date: Tue, 18 Dec 2018 10:38:26 +0800 > Subject: [PATCH] bcache: treat stale && dirty keys as bad keys > > Stale && dirty keys can be produced in the follow way: > After writeback in write_dirty_finish(), dirty keys k1 will > replace by clean keys k2 > ==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key); > ==>btree_insert_fn(struct btree_op *b_op, struct btree *b) > ==>static int bch_btree_insert_node(struct btree *b, > struct btree_op *op, > struct keylist *insert_keys, > atomic_t *journal_ref, > Then two steps: > A) update k1 to k2 in btree node memory; > bch_btree_insert_keys(b, op, insert_keys, replace_key) > B) Write the bset(contains k2) to cache disk by a 30s delay work > bch_btree_leaf_dirty(b, journal_ref). > But before the 30s delay work write the bset to cache device, > these things happend: > A) GC works, and reclaim the bucket k2 point to; > B) Allocator works, and invalidate the bucket k2 point to, > and increase the gen of the bucket, and place it into free_inc > fifo; > C) Until now, the 30s delay work still does not finish work, > so in the disk, the key still is k1, it is dirty and stale > (its gen is smaller than the gen of the bucket). and then the > machine power off suddenly happens; > D) When the machine power on again, after the btree reconstruction, > the stale dirty key appear. Only prior to journal replay, right? Or did you uncover something more severe? > In bch_extent_bad(), when expensive_debug_checks is off, it would > treat the dirty key as good even it is stale keys, and it would > cause bellow probelms: > A) In read_dirty() it would cause machine crash: > BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > B) It could be worse when reads hits stale dirty keys, it would > read old incorrect data. Neither of these can happen until after journal replay is finished. Prior to journal replay we expect to find stale dirty keys - if we find any after journal replay then it's indicative of a real bug. > > This patch tolerate the existence of these stale && dirty keys, > and treat them as bad key in bch_extent_bad(). > > Signed-off-by: Tang Junhui <tang.junhui.linux@xxxxxxxxx> > --- > drivers/md/bcache/extents.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c > index 1d09674..f9eb03c 100644 > --- a/drivers/md/bcache/extents.c > +++ b/drivers/md/bcache/extents.c > @@ -535,6 +535,7 @@ static bool bch_extent_bad(struct btree_keys *bk, > const struct bkey *k) > { > struct btree *b = container_of(bk, struct btree, keys); > unsigned i, stale; > + char buf[80]; > > if (!KEY_PTRS(k) || > bch_extent_invalid(bk, k)) > @@ -544,19 +545,20 @@ static bool bch_extent_bad(struct btree_keys > *bk, const struct bkey *k) > if (!ptr_available(b->c, k, i)) > return true; > > - if (!expensive_debug_checks(b->c) && KEY_DIRTY(k)) > - return false; > - > for (i = 0; i < KEY_PTRS(k); i++) { > stale = ptr_stale(b->c, k, i); > > + if (stale && KEY_DIRTY(k)) { > + bch_extent_to_text(buf, sizeof(buf), k); > + pr_info("stale dirty pointer, stale %u, key: %s", > + stale, > + buf); > + } > + > btree_bug_on(stale > 96, b, > "key too stale: %i, need_gc %u", > stale, b->c->need_gc); > > - btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k), > - b, "stale dirty pointer"); > - > if (stale) > return true; > > -- > 1.8.3.1