Re: Recover from "journal entries X-Y missing! (replaying X-Z)", "IO error on writing btree."

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Donnerstag, 21. März 2019 12:04:02 CET Junhui Tang wrote:
> I meet this bug and send a patch before,
> Please have a try with  this patch.
> 
> https://www.spinics.net/lists/linux-bcache/msg06555.html
> 
> From: Tang Junhui <tang.junhui.linux@xxxxxxxxx>
> Date: Wed, 12 Sep 2018 04:42:14 +0800
> Subject: [PATCH] bcache: fix failure in journal relplay
> 
> journal replay failed with messages:
> Sep 10 19:10:43 ceph kernel: bcache: error on
> bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries
> 2057493-2057567 missing! (replaying 2057493-2076601), disabling
> caching
> 
> The reason is in journal_reclaim(), we send discard command and
> reclaim those journal buckets whose seq is old than the last_seq_now,
> but before we write a journal with last_seq_now, the machine is
> restarted, so the journal with the last_seq_now is not written to
> the journal bucket, and the last_seq_wrote in the newest journal is
> old than last_seq_now which we expect to be, so when we doing
> replay, journals from last_seq_wrote to last_seq_now are missing.
> 
> It's hard to write a journal immediately after journal_reclaim(),
> and it harmless if those missed journal are caused by discarding
> since those journals are already wrote to btree node. So, if miss
> seqs are started from the beginning journal, we treat it as normal,
> and only print a message to show the miss journal, and point out
> it maybe caused by discarding.
> 
> Signed-off-by: Tang Junhui <tang.junhui.linux@xxxxxxxxx>
> ---
>  drivers/md/bcache/journal.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
> index 10748c6..9b4cd2e 100644
> --- a/drivers/md/bcache/journal.c
> +++ b/drivers/md/bcache/journal.c
> @@ -328,9 +328,13 @@ int bch_journal_replay(struct cache_set *s,
> struct list_head *list)
>   list_for_each_entry(i, list, list) {
>   BUG_ON(i->pin && atomic_read(i->pin) != 1);
> 
> - cache_set_err_on(n != i->j.seq, s,
> -"bcache: journal entries %llu-%llu missing! (replaying %llu-%llu)",
> + if (n != i->j.seq && n == start)
> + pr_info("bcache: journal entries %llu-%llu may be discarded!
> (replaying %llu-%llu)",
>   n, i->j.seq - 1, start, end);
> + else
> + cache_set_err_on(n != i->j.seq, s,
> +        "bcache: journal entries %llu-%llu missing! (replaying %llu-%llu)",
> +        n, i->j.seq - 1, start, end);
> 
>   for (k = i->j.start;
>        k < bset_bkey_last(&i->j);

Hi!

Thanks a lot!  I patched Linux 5.0.2 with your patch (after cleaning up the 
whitespace to match the actual source) and was able to boot my machine with 
it, which cleaned up the bcache issue and allowed me to subsequently boot the 
machine using an unpatched kernel again.

Thanks again,
Dennis

Attachment: signature.asc
Description: This is a digitally signed message part.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux