Re: [PATCH] bcache: treat stale && dirty keys as bad keys

Junhui Tang <tang.junhui.linux@xxxxxxxxx> · Thu, 20 Dec 2018 16:40:06 +0800

Kent Overstreet <kent.overstreet@xxxxxxxxx> 于2018年12月19日周三 下午7:32写道：
>
> On Wed, Dec 19, 2018 at 09:32:55AM +0800, Junhui Tang wrote:
> > hello Kent
> >
> > Long long no see, glad to hear you again.
> >
> > >> Then two steps:
> > >> A) update k1 to k2 in btree node memory;
> > >>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
> > >> B) Write the bset(contains k2) to cache disk by a 30s delay work
> > >>    bch_btree_leaf_dirty(b, journal_ref).
> > >> But before the 30s delay work write the bset to cache device,
> > >> these things happend:
> > >> A) GC works, and reclaim the bucket k2 point to;
> > >> B) Allocator works, and invalidate the bucket k2 point to,
> > >>    and increase the gen of the bucket, and place it into free_inc
> > >>    fifo;
> > >> C) Until now, the 30s delay work still does not finish work,
> > >>    so in the disk, the key still is k1, it is dirty and stale
> > >>    (its gen is smaller than the gen of the bucket). and then the
> > >>    machine power off suddenly happens;
> > >> D) When the machine power on again, after the btree reconstruction,
> > >>    the stale dirty key appear.
> >
> > > Only prior to journal replay, right? Or did you uncover something more severe?
> > No, it's after the journal replay, and in write_dirty_finish(), when
> > replace a dirty key with a clean key by calling bch_btree_insert(),
> > no journal will write.
>
> Holy crap you're right, this was from before I moved journalling to be driven by
> the btree update path.
>
> I think a better fix here would be to journal the btree updates writeback does,
> but given that we haven't been journalling those updates all this time your fix
> does make sense too.

Journal those updates would consume lots of resources, so I don't think it a
good way. And this fix is very simple.

Kent, Could you add an Acked-by or Reviewed-by for this patch? So Coly
can put it to upstream.
>
> >
> > >> In bch_extent_bad(), when expensive_debug_checks is off, it would
> > >> treat the dirty key as good even it is stale keys, and it would
> > >> cause bellow probelms:
> > >> A) In read_dirty() it would cause machine crash:
> > >>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > >> B) It could be worse when reads hits stale dirty keys, it would
> > >>    read old incorrect data.
> >
> > >Neither of these can happen until after journal replay is finished. Prior to
> > >journal replay we expect to find stale dirty keys - if we find any after journal
> > >replay then it's indicative of a real bug.
> > As I said previous, since no journal writes after inserting a replace key in
> > writeback, so this issue has nothing to do with journal.
> >
> > This is a real problem in my environment, after running IO sometimes, I turn off
> > the power suddenly,  then turn on the power, and the machine crash in
> > read_dirty() due to the stale && dirty keys.