On 2019/3/22 2:14 下午, Coly Li wrote: > On 2019/3/21 10:16 下午, Coly Li wrote: >> Hi Junhui, >> >> Now I am able to understand your patch. Yes this patch may fix one of >> the condition that jset get lost. >> >> We should have this fix in v5.1, I will handle the format issue. And if >> you don't mind I may re-compose a commit log to explain what exactly is >> fixed. > > Hi Junhui, > > When I review the patch, I feel there is one point I still do not > understand, could you please give me more hint ? > > From your commit log, "so when we doing replay, journals from > last_seq_wrote to last_seq_now are missing.", can you show me the code > to explain how such condition happens ? Aha, I realize this is for discard enabled condition. Hmm, but discard is disabled by default. Hi Dennis, Is discard enabled in your environment ? Or it is just disabled by default. Thanks. Coly Li > >> On 2019/3/21 7:04 下午, Junhui Tang wrote: >>> I meet this bug and send a patch before, >>> Please have a try with this patch. >>> >>> https://www.spinics.net/lists/linux-bcache/msg06555.html >>> >>> From: Tang Junhui <tang.junhui.linux@xxxxxxxxx> >>> Date: Wed, 12 Sep 2018 04:42:14 +0800 >>> Subject: [PATCH] bcache: fix failure in journal relplay >>> >>> journal replay failed with messages: >>> Sep 10 19:10:43 ceph kernel: bcache: error on >>> bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries >>> 2057493-2057567 missing! (replaying 2057493-2076601), disabling >>> caching >>> >>> The reason is in journal_reclaim(), we send discard command and >>> reclaim those journal buckets whose seq is old than the last_seq_now, >>> but before we write a journal with last_seq_now, the machine is >>> restarted, so the journal with the last_seq_now is not written to >>> the journal bucket, and the last_seq_wrote in the newest journal is >>> old than last_seq_now which we expect to be, so when we doing >>> replay, journals from last_seq_wrote to last_seq_now are missing. >>> >>> It's hard to write a journal immediately after journal_reclaim(), >>> and it harmless if those missed journal are caused by discarding >>> since those journals are already wrote to btree node. So, if miss >>> seqs are started from the beginning journal, we treat it as normal, >>> and only print a message to show the miss journal, and point out >>> it maybe caused by discarding. >>> >>> Signed-off-by: Tang Junhui <tang.junhui.linux@xxxxxxxxx> >>> --- >>> drivers/md/bcache/journal.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c >>> index 10748c6..9b4cd2e 100644 >>> --- a/drivers/md/bcache/journal.c >>> +++ b/drivers/md/bcache/journal.c >>> @@ -328,9 +328,13 @@ int bch_journal_replay(struct cache_set *s, >>> struct list_head *list) >>> list_for_each_entry(i, list, list) { >>> BUG_ON(i->pin && atomic_read(i->pin) != 1); >>> >>> - cache_set_err_on(n != i->j.seq, s, >>> -"bcache: journal entries %llu-%llu missing! (replaying %llu-%llu)", >>> + if (n != i->j.seq && n == start) >>> + pr_info("bcache: journal entries %llu-%llu may be discarded! >>> (replaying %llu-%llu)", >>> n, i->j.seq - 1, start, end); >>> + else >>> + cache_set_err_on(n != i->j.seq, s, >>> + "bcache: journal entries %llu-%llu missing! (replaying %llu-%llu)", >>> + n, i->j.seq - 1, start, end); >>> >>> for (k = i->j.start; >>> k < bset_bkey_last(&i->j); >>> -- >>> 1.8.3.1 >>> >>> >>> Coly Li <colyli@xxxxxxx <mailto:colyli@xxxxxxx>> 于2019年3月21日周四 下 >>> 午12:52写道: >>> >>> On 2019/3/21 3:33 上午, Dennis Schridde wrote: >>>> On Mittwoch, 20. März 2019 12:16:29 CET Coly Li wrote: >>>>> On 2019/3/20 5:42 上午, Dennis Schridde wrote: >>>>>> Hello! >>>>>> >>>>>> During boot my bcache device cannot be activated anymore and >>>>>> hence the filesystem content is inaccessible. It appears that >>>>>> parts of the journal are corrupted, since dmesg says: ``` >>>>>> bcache: register_bdev() registered backing device sda3 bcache: >>>>>> error on UUID: bcache: journal entries X-Y missing! (replaying >>>>>> X-Z) , disabling caching bcache: bch_count_io_errors() nvme0n1: >>>>>> IO error on writing btree. bcache: bch_btree_insert() error -5 >>>>>> bcache: bch_cached_dev_attach() Can't attach sda3: shutting >>>>>> down bcache: register_cache() registered cache device nvme0n1 >>>>>> bcache: bch_count_io_errors() nvme0n1: IO error on writing >>>>>> btree. bcache: bch_count_io_errors() nvme0n1: IO error on >>>>>> writing btree. bcache: bch_count_io_errors() nvme0n1: IO error >>>>>> on writing btree. bcache: bch_count_io_errors() nvme0n1: IO >>>>>> error on writing btree. bcache: bch_count_io_errors() nvme0n1: >>>>>> IO error on writing btree. bcache: bch_count_io_errors() >>>>>> nvme0n1: IO error on writing btree. bcache: cache_set_free() >>>>>> Cache set UUID unregistered ``` >>>>>> >>>>>> UUID represents a UUID. X, Y, Z are integers, with X<Y<Z, >>>>>> Y=X+12 and Z=Y+116. >>>>>> >>>>>> Error -5 is EIO, i.e. a generic I/O error. Is there a way to >>>>>> get more information on where that error originates from and >>>>>> what exactly is broken? Did bcache just detect broken data, or >>>>>> is the device itself broken? Which device, the HDD or the NVMe >>>>>> SSD? >>>>>> >>>>>> Is there a way to recover from this without loosing all data >>>>>> on the drive? Is it maybe possible to just discard the >>>>>> journal entries >X and return to the state the block device was >>>>>> at point X, loosing only modifications after that point? >>>>>> >>>>>> Background: The situation appeared after my computer was >>>>>> running for a few hours and the screen stayed dark when I tried >>>>>> to wake the monitor from standby. The machine did not react to >>>>>> NumLock or Ctrl+Alt+Entf, so I issued a magic SysRq and tried >>>>>> to safely reboot the machine by slowly typing REISUB. Sadly >>>>>> after this the machine ended up in the state described above. >>>>> >>>>> It seems some journal set was lost during bch_journal_replay() >>>>> after reboot and start cache set. >>>>> >>>>> During my test for a journal deadlock fix, I also observe this >>>>> issue. I change the journal buckets number from 256 to 8, such >>>>> problem can be observe almost every reboot. >>>>> >>>>> This one is not fixed yet and I am currently working on it. >>>>> >>>>> What kernel version do you use ? I though this issue was only >>>>> introduced by my current changes, but from your report it seems >>>>> such problem happens in upstream kernel as well. >>> >>>> I was using Linux 5.0.2 (with Gentoo patches, which are minimal, >>>> AFAIK). >>> >>>> I would have expected that S and/or U in REISUB would write all >>>> bcache metadata to disk and prevent such problems. Is this a wrong >>>> assumption? >>> >>>> Will your patches allow me to use the cache again, or will they >>>> prevent the metadata from breaking in the first place? >>> >>> Now I am still looking for the reason how such problem happens. Once I >>> have a fix, I will let you know. >>> >>> Thanks. >>> >>> Coly Li >>> >>> >>> >> > > -- Coly Li