-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2019/3/20 5:42 上午, Dennis Schridde wrote: > Hello! > > During boot my bcache device cannot be activated anymore and hence > the filesystem content is inaccessible. It appears that parts of > the journal are corrupted, since dmesg says: ``` bcache: > register_bdev() registered backing device sda3 bcache: error on > UUID: bcache: journal entries X-Y missing! (replaying X-Z) , > disabling caching bcache: bch_count_io_errors() nvme0n1: IO error > on writing btree. bcache: bch_btree_insert() error -5 bcache: > bch_cached_dev_attach() Can't attach sda3: shutting down bcache: > register_cache() registered cache device nvme0n1 bcache: > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > bch_count_io_errors() nvme0n1: IO error on writing btree. bcache: > cache_set_free() Cache set UUID unregistered ``` > > UUID represents a UUID. X, Y, Z are integers, with X<Y<Z, Y=X+12 > and Z=Y+116. > > Error -5 is EIO, i.e. a generic I/O error. Is there a way to get > more information on where that error originates from and what > exactly is broken? Did bcache just detect broken data, or is the > device itself broken? Which device, the HDD or the NVMe SSD? > > Is there a way to recover from this without loosing all data on > the drive? Is it maybe possible to just discard the journal > entries >X and return to the state the block device was at point X, > loosing only modifications after that point? > > Background: The situation appeared after my computer was running > for a few hours and the screen stayed dark when I tried to wake > the monitor from standby. The machine did not react to NumLock or > Ctrl+Alt+Entf, so I issued a magic SysRq and tried to safely > reboot the machine by slowly typing REISUB. Sadly after this the > machine ended up in the state described above. It seems some journal set was lost during bch_journal_replay() after reboot and start cache set. During my test for a journal deadlock fix, I also observe this issue. I change the journal buckets number from 256 to 8, such problem can be observe almost every reboot. This one is not fixed yet and I am currently working on it. What kernel version do you use ? I though this issue was only introduced by my current changes, but from your report it seems such problem happens in upstream kernel as well. Thanks. - -- Coly Li -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEE6j5FL/T5SGCN6PrQxzkHk2t9+PwFAlySIQsACgkQxzkHk2t9 +PxSbw/+J3X6LRHbHRr74jqmKcCYoWRWUSZnKcRFlKbRDOi9YDHPY5IuXB++bnt4 XCW7sK4xCosWW2OiWXScqaShW4D7T3R6Yl7qU/q+dcsoXspL+aNiWGDbvRhdQ7rC nQOE3+8OhijX/k8JSl2BXqkR4R/1EsAUqw88XWupTtFlIzRJtDftt2EJfc19BgMl z6Xv8ZMlisnMCY9R2AAdMjgW65ewMa9nihlpGiAC8AW8Gd9bgPLR0LIJdxATGflL jxDgTpepZunmtoyQjCvIaQKv1y7K70TM0mltLjUOckAOAznoqUj4ViKsJQ4DJPuw P7Dna1/pi/1mIVwdpqIetv/xWCVOf413GoM56jD438sTmPt46Zhp7Ze21jBlaF6C EF/LLJ4X16pSA5pP+jbqaHE1KlLH6cgfdxCnvApTbAlTk7RSAw4KSl9F7ao9IvbN 81I38QFol6sOrvnUsn+iG7rWH4Ekhq6SI8kxrnCBEhWBJ0Km5iU5inoNahPrnB4R TCshgMy8VUp8qUnXPYRoSXlJ+SxJkMFTYVyiLeimfjeuzLxAh8mlEslcjJwdM+Rs iKmHW6YZaXxwspQ4VenUHwOv17xnYCSrXicurDLDie1syAiN3Gg4iEia2r93uUxq iWokebNFpTPFW8K8ZYTxfEQ6stQH5zsoMVApz0GqckvJSzKktCg= =8Jhn -----END PGP SIGNATURE-----