On 2019/3/21 10:16 下午, Coly Li wrote: > Hi Junhui, > > Now I am able to understand your patch. Yes this patch may fix one of > the condition that jset get lost. > > We should have this fix in v5.1, I will handle the format issue. And if > you don't mind I may re-compose a commit log to explain what exactly is > fixed. Hi Junhui, When I review the patch, I feel there is one point I still do not understand, could you please give me more hint ? >From your commit log, "so when we doing replay, journals from last_seq_wrote to last_seq_now are missing.", can you show me the code to explain how such condition happens ? Thanks in advance. Coly Li > On 2019/3/21 7:04 下午, Junhui Tang wrote: >> I meet this bug and send a patch before, >> Please have a try with this patch. >> >> https://www.spinics.net/lists/linux-bcache/msg06555.html >> >> From: Tang Junhui <tang.junhui.linux@xxxxxxxxx> >> Date: Wed, 12 Sep 2018 04:42:14 +0800 >> Subject: [PATCH] bcache: fix failure in journal relplay >> >> journal replay failed with messages: >> Sep 10 19:10:43 ceph kernel: bcache: error on >> bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries >> 2057493-2057567 missing! (replaying 2057493-2076601), disabling >> caching >> >> The reason is in journal_reclaim(), we send discard command and >> reclaim those journal buckets whose seq is old than the last_seq_now, >> but before we write a journal with last_seq_now, the machine is >> restarted, so the journal with the last_seq_now is not written to >> the journal bucket, and the last_seq_wrote in the newest journal is >> old than last_seq_now which we expect to be, so when we doing >> replay, journals from last_seq_wrote to last_seq_now are missing. >> >> It's hard to write a journal immediately after journal_reclaim(), >> and it harmless if those missed journal are caused by discarding >> since those journals are already wrote to btree node. So, if miss >> seqs are started from the beginning journal, we treat it as normal, >> and only print a message to show the miss journal, and point out >> it maybe caused by discarding. >> >> Signed-off-by: Tang Junhui <tang.junhui.linux@xxxxxxxxx> >> --- >> drivers/md/bcache/journal.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c >> index 10748c6..9b4cd2e 100644 >> --- a/drivers/md/bcache/journal.c >> +++ b/drivers/md/bcache/journal.c >> @@ -328,9 +328,13 @@ int bch_journal_replay(struct cache_set *s, >> struct list_head *list) >> list_for_each_entry(i, list, list) { >> BUG_ON(i->pin && atomic_read(i->pin) != 1); >> >> - cache_set_err_on(n != i->j.seq, s, >> -"bcache: journal entries %llu-%llu missing! (replaying %llu-%llu)", >> + if (n != i->j.seq && n == start) >> + pr_info("bcache: journal entries %llu-%llu may be discarded! >> (replaying %llu-%llu)", >> n, i->j.seq - 1, start, end); >> + else >> + cache_set_err_on(n != i->j.seq, s, >> + "bcache: journal entries %llu-%llu missing! (replaying %llu-%llu)", >> + n, i->j.seq - 1, start, end); >> >> for (k = i->j.start; >> k < bset_bkey_last(&i->j); >> -- >> 1.8.3.1 >> >> >> Coly Li <colyli@xxxxxxx <mailto:colyli@xxxxxxx>> 于2019年3月21日周四 下 >> 午12:52写道: >> >> On 2019/3/21 3:33 上午, Dennis Schridde wrote: >>> On Mittwoch, 20. März 2019 12:16:29 CET Coly Li wrote: >>>> On 2019/3/20 5:42 上午, Dennis Schridde wrote: >>>>> Hello! >>>>> >>>>> During boot my bcache device cannot be activated anymore and >>>>> hence the filesystem content is inaccessible. It appears that >>>>> parts of the journal are corrupted, since dmesg says: ``` >>>>> bcache: register_bdev() registered backing device sda3 bcache: >>>>> error on UUID: bcache: journal entries X-Y missing! (replaying >>>>> X-Z) , disabling caching bcache: bch_count_io_errors() nvme0n1: >>>>> IO error on writing btree. bcache: bch_btree_insert() error -5 >>>>> bcache: bch_cached_dev_attach() Can't attach sda3: shutting >>>>> down bcache: register_cache() registered cache device nvme0n1 >>>>> bcache: bch_count_io_errors() nvme0n1: IO error on writing >>>>> btree. bcache: bch_count_io_errors() nvme0n1: IO error on >>>>> writing btree. bcache: bch_count_io_errors() nvme0n1: IO error >>>>> on writing btree. bcache: bch_count_io_errors() nvme0n1: IO >>>>> error on writing btree. bcache: bch_count_io_errors() nvme0n1: >>>>> IO error on writing btree. bcache: bch_count_io_errors() >>>>> nvme0n1: IO error on writing btree. bcache: cache_set_free() >>>>> Cache set UUID unregistered ``` >>>>> >>>>> UUID represents a UUID. X, Y, Z are integers, with X<Y<Z, >>>>> Y=X+12 and Z=Y+116. >>>>> >>>>> Error -5 is EIO, i.e. a generic I/O error. Is there a way to >>>>> get more information on where that error originates from and >>>>> what exactly is broken? Did bcache just detect broken data, or >>>>> is the device itself broken? Which device, the HDD or the NVMe >>>>> SSD? >>>>> >>>>> Is there a way to recover from this without loosing all data >>>>> on the drive? Is it maybe possible to just discard the >>>>> journal entries >X and return to the state the block device was >>>>> at point X, loosing only modifications after that point? >>>>> >>>>> Background: The situation appeared after my computer was >>>>> running for a few hours and the screen stayed dark when I tried >>>>> to wake the monitor from standby. The machine did not react to >>>>> NumLock or Ctrl+Alt+Entf, so I issued a magic SysRq and tried >>>>> to safely reboot the machine by slowly typing REISUB. Sadly >>>>> after this the machine ended up in the state described above. >>>> >>>> It seems some journal set was lost during bch_journal_replay() >>>> after reboot and start cache set. >>>> >>>> During my test for a journal deadlock fix, I also observe this >>>> issue. I change the journal buckets number from 256 to 8, such >>>> problem can be observe almost every reboot. >>>> >>>> This one is not fixed yet and I am currently working on it. >>>> >>>> What kernel version do you use ? I though this issue was only >>>> introduced by my current changes, but from your report it seems >>>> such problem happens in upstream kernel as well. >> >>> I was using Linux 5.0.2 (with Gentoo patches, which are minimal, >>> AFAIK). >> >>> I would have expected that S and/or U in REISUB would write all >>> bcache metadata to disk and prevent such problems. Is this a wrong >>> assumption? >> >>> Will your patches allow me to use the cache again, or will they >>> prevent the metadata from breaking in the first place? >> >> Now I am still looking for the reason how such problem happens. Once I >> have a fix, I will let you know. >> >> Thanks. >> >> Coly Li >> >> >> > -- Coly Li