Re: [PATCH] bcache: treat stale && dirty keys as bad keys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello Kent

Long long no see, glad to hear you again.

>> Then two steps:
>> A) update k1 to k2 in btree node memory;
>>    bch_btree_insert_keys(b, op, insert_keys, replace_key)
>> B) Write the bset(contains k2) to cache disk by a 30s delay work
>>    bch_btree_leaf_dirty(b, journal_ref).
>> But before the 30s delay work write the bset to cache device,
>> these things happend:
>> A) GC works, and reclaim the bucket k2 point to;
>> B) Allocator works, and invalidate the bucket k2 point to,
>>    and increase the gen of the bucket, and place it into free_inc
>>    fifo;
>> C) Until now, the 30s delay work still does not finish work,
>>    so in the disk, the key still is k1, it is dirty and stale
>>    (its gen is smaller than the gen of the bucket). and then the
>>    machine power off suddenly happens;
>> D) When the machine power on again, after the btree reconstruction,
>>    the stale dirty key appear.

> Only prior to journal replay, right? Or did you uncover something more severe?
No, it's after the journal replay, and in write_dirty_finish(), when
replace a dirty key with a clean key by calling bch_btree_insert(),
no journal will write.

>> In bch_extent_bad(), when expensive_debug_checks is off, it would
>> treat the dirty key as good even it is stale keys, and it would
>> cause bellow probelms:
>> A) In read_dirty() it would cause machine crash:
>>    BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
>> B) It could be worse when reads hits stale dirty keys, it would
>>    read old incorrect data.

>Neither of these can happen until after journal replay is finished. Prior to
>journal replay we expect to find stale dirty keys - if we find any after journal
>replay then it's indicative of a real bug.
As I said previous, since no journal writes after inserting a replace key in
writeback, so this issue has nothing to do with journal.

This is a real problem in my environment, after running IO sometimes, I turn off
the power suddenly,  then turn on the power, and the machine crash in
read_dirty() due to the stale && dirty keys.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux