Re: Recover from "journal entries X-Y missing! (replaying X-Z)", "IO error on writing btree."

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2019/3/21 3:33 上午, Dennis Schridde wrote:
> On Mittwoch, 20. März 2019 12:16:29 CET Coly Li wrote:
>> On 2019/3/20 5:42 上午, Dennis Schridde wrote:
>>> Hello!
>>> 
>>> During boot my bcache device cannot be activated anymore and
>>> hence the filesystem content is inaccessible.  It appears that
>>> parts of the journal are corrupted, since dmesg says: ```
>>> bcache: register_bdev() registered backing device sda3 bcache:
>>> error on UUID: bcache: journal entries X-Y missing! (replaying
>>> X-Z) , disabling caching bcache: bch_count_io_errors() nvme0n1:
>>> IO error on writing btree. bcache: bch_btree_insert() error -5
>>> bcache: bch_cached_dev_attach() Can't attach sda3: shutting
>>> down bcache: register_cache() registered cache device nvme0n1
>>> bcache: bch_count_io_errors() nvme0n1: IO error on writing
>>> btree. bcache: bch_count_io_errors() nvme0n1: IO error on
>>> writing btree. bcache: bch_count_io_errors() nvme0n1: IO error
>>> on writing btree. bcache: bch_count_io_errors() nvme0n1: IO
>>> error on writing btree. bcache: bch_count_io_errors() nvme0n1:
>>> IO error on writing btree. bcache: bch_count_io_errors()
>>> nvme0n1: IO error on writing btree. bcache: cache_set_free()
>>> Cache set UUID unregistered ```
>>> 
>>> UUID represents a UUID.  X, Y, Z are integers, with X<Y<Z,
>>> Y=X+12 and Z=Y+116.
>>> 
>>> Error -5 is EIO, i.e. a generic I/O error.  Is there a way to
>>> get more information on where that error originates from and
>>> what exactly is broken? Did bcache just detect broken data, or
>>> is the device itself broken?  Which device, the HDD or the NVMe
>>> SSD?
>>> 
>>> Is there a way to recover from this without loosing all data
>>> on the drive?  Is it maybe possible to just discard the
>>> journal entries >X and return to the state the block device was
>>> at point X, loosing only modifications after that point?
>>> 
>>> Background: The situation appeared after my computer was
>>> running for a few hours and the screen stayed dark when I tried
>>> to wake the monitor from standby.  The machine did not react to
>>> NumLock or Ctrl+Alt+Entf, so I issued a magic SysRq and tried
>>> to safely reboot the machine by slowly typing REISUB. Sadly
>>> after this the machine ended up in the state described above.
>> 
>> It seems some journal set was lost during bch_journal_replay()
>> after reboot and start cache set.
>> 
>> During my test for a journal deadlock fix, I also observe this
>> issue. I change the journal buckets number from 256 to 8, such
>> problem can be observe almost every reboot.
>> 
>> This one is not fixed yet and I am currently working on it.
>> 
>> What kernel version do you use ?  I though this issue was only 
>> introduced by my current changes, but from your report it seems
>> such problem happens in upstream kernel as well.
> 
> I was using Linux 5.0.2 (with Gentoo patches, which are minimal,
> AFAIK).
> 
> I would have expected that S and/or U in REISUB would write all
> bcache metadata to disk and prevent such problems.  Is this a wrong
> assumption?
> 
> Will your patches allow me to use the cache again, or will they
> prevent the metadata from breaking in the first place?

Now I am still looking for the reason how such problem happens. Once I
have a fix, I will let you know.

Thanks.

Coly Li


- -- 

Coly Li
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE6j5FL/T5SGCN6PrQxzkHk2t9+PwFAlyTGEQACgkQxzkHk2t9
+Pwa2A//Xx3NcWEwxBkG9cYX8UjjLGilJO9PGtfC1U4CIKscFalohJ4f+28Vt/Qv
Iqsbo87O+YqzidOI4L1StUOvCgMexg5A0GoUTFCZLI8G1p6tH6oUUKylmuiDSSiB
ZmDZNzHJRPiC8y1fJwxbFoE1Jx1nvlwFaOJz7OSJfPoxEc8ZCHV/bOseD0xxtB/c
+Pap1yXkS0uBvDNjh9gePMfgq5QYEA+9yrRSCDUPFR38MirrbSw+gGgEqiU3v7YN
f6axuJBAwyDwGMXLG5iYqWUgMuIFWy7kjTJJsIDAZiPqojFgaWdStZM0ynnd2+3P
OpmwipjUksy5L08ZwPvqW562JdvdjksijIrF9vxo+UYhZocbM2IX/8+YSUhFbjVs
OfHHvUXhJs/argIJNGzw5QW18Uepi5J8WrJdONboMGsE1PS0zKQjsT0ToVuGG91t
WH6fOwQtuhuPWt66APWE0/WT+J2wtvfIGRYNSEaNGbJSVC4S8pzPa0r+0kRnu361
gxOVYsMe8/Quc1r+1Y/z/2jEG1Cut67Ed1U64ebIMnrAXYW+8D4IaC4qKnPumYHx
fgJKL4WAaj/9aVOaU6vyKX9GG83NoUjN6MxpCx2RPVdKOk3lv3Y4uzBDYtDoBjet
Gq9E73720JiTL4qWWXyJHwxmwMNNwi9+POdjQAaUo4R1E9Mm+uA=
=vCRz
-----END PGP SIGNATURE-----



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux