On Wed, Dec 19, 2018 at 09:32:55AM +0800, Junhui Tang wrote: > hello Kent > > Long long no see, glad to hear you again. > > >> Then two steps: > >> A) update k1 to k2 in btree node memory; > >> bch_btree_insert_keys(b, op, insert_keys, replace_key) > >> B) Write the bset(contains k2) to cache disk by a 30s delay work > >> bch_btree_leaf_dirty(b, journal_ref). > >> But before the 30s delay work write the bset to cache device, > >> these things happend: > >> A) GC works, and reclaim the bucket k2 point to; > >> B) Allocator works, and invalidate the bucket k2 point to, > >> and increase the gen of the bucket, and place it into free_inc > >> fifo; > >> C) Until now, the 30s delay work still does not finish work, > >> so in the disk, the key still is k1, it is dirty and stale > >> (its gen is smaller than the gen of the bucket). and then the > >> machine power off suddenly happens; > >> D) When the machine power on again, after the btree reconstruction, > >> the stale dirty key appear. > > > Only prior to journal replay, right? Or did you uncover something more severe? > No, it's after the journal replay, and in write_dirty_finish(), when > replace a dirty key with a clean key by calling bch_btree_insert(), > no journal will write. Holy crap you're right, this was from before I moved journalling to be driven by the btree update path. I think a better fix here would be to journal the btree updates writeback does, but given that we haven't been journalling those updates all this time your fix does make sense too. > > >> In bch_extent_bad(), when expensive_debug_checks is off, it would > >> treat the dirty key as good even it is stale keys, and it would > >> cause bellow probelms: > >> A) In read_dirty() it would cause machine crash: > >> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > >> B) It could be worse when reads hits stale dirty keys, it would > >> read old incorrect data. > > >Neither of these can happen until after journal replay is finished. Prior to > >journal replay we expect to find stale dirty keys - if we find any after journal > >replay then it's indicative of a real bug. > As I said previous, since no journal writes after inserting a replace key in > writeback, so this issue has nothing to do with journal. > > This is a real problem in my environment, after running IO sometimes, I turn off > the power suddenly, then turn on the power, and the machine crash in > read_dirty() due to the stale && dirty keys.