On Mittwoch, 20. März 2019 12:16:29 CET Coly Li wrote: > On 2019/3/20 5:42 上午, Dennis Schridde wrote: > > Hello! > > > > During boot my bcache device cannot be activated anymore and hence > > the filesystem content is inaccessible. It appears that parts of > > the journal are corrupted, since dmesg says: ``` bcache: > > register_bdev() registered backing device sda3 bcache: error on > > UUID: bcache: journal entries X-Y missing! (replaying X-Z) , > > disabling caching bcache: bch_count_io_errors() nvme0n1: IO error > > on writing btree. bcache: bch_btree_insert() error -5 bcache: > > bch_cached_dev_attach() Can't attach sda3: shutting down bcache: > > register_cache() registered cache device nvme0n1 bcache: > > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > > cache_set_free() Cache set UUID unregistered ``` > > > > UUID represents a UUID. X, Y, Z are integers, with X<Y<Z, Y=X+12 > > and Z=Y+116. > > > > Error -5 is EIO, i.e. a generic I/O error. Is there a way to get > > more information on where that error originates from and what > > exactly is broken? Did bcache just detect broken data, or is the > > device itself broken? Which device, the HDD or the NVMe SSD? > > > > Is there a way to recover from this without loosing all data on > > the drive? Is it maybe possible to just discard the journal > > entries >X and return to the state the block device was at point X, > > loosing only modifications after that point? > > > > Background: The situation appeared after my computer was running > > for a few hours and the screen stayed dark when I tried to wake > > the monitor from standby. The machine did not react to NumLock or > > Ctrl+Alt+Entf, so I issued a magic SysRq and tried to safely > > reboot the machine by slowly typing REISUB. Sadly after this the > > machine ended up in the state described above. > > It seems some journal set was lost during bch_journal_replay() after > reboot and start cache set. > > During my test for a journal deadlock fix, I also observe this issue. > I change the journal buckets number from 256 to 8, such problem can be > observe almost every reboot. > > This one is not fixed yet and I am currently working on it. > > What kernel version do you use ? I though this issue was only > introduced by my current changes, but from your report it seems such > problem happens in upstream kernel as well. I was using Linux 5.0.2 (with Gentoo patches, which are minimal, AFAIK). I would have expected that S and/or U in REISUB would write all bcache metadata to disk and prevent such problems. Is this a wrong assumption? Will your patches allow me to use the cache again, or will they prevent the metadata from breaking in the first place? --Dennis
Attachment:
signature.asc
Description: This is a digitally signed message part.