Hi Kent, On Wed, Nov 10, 2021 at 5:13 PM Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > Your journal is completely full, so persisting the new btree root while doing > journal replay is hanging. > > There isn't a _good_ solution for this journal deadlock in bcache (it's fixed in > bcachefs), but there is a hack: > > edit drivers/md/bcache/btree.c line 2493 > > delete the call to bch_journal_meta(), and build a new kernel. Once you've > gotten it to register, do a clean shutdown and then go back to a stock kernel. > > Running the kernel with that call deleted won't be safe if you crash, but it'll > get you going again. Thanks for the clarification and suggestions. Would it be OK to implement that workaround if requested by a sysadmin ? (say, to ack the data safety / crash risk) Right now the issue is known, reproduces with v5.15, has no good solution, remains after reboot, prints hung task warnings continuously, and prevents using the device at all; and this workaround requires kernel dev/build skills. Since its effects seem bad enough, it would seem fair enough to provide a way out even if it's not a _good_ one. Say, we could try and detect the journal full during journal replay, and handle it by failing the device registration. This would unblock the tasks, and provide a more intuitive error message. (maybe leading to the next paragraph.) We could also add a sysfs tunable to skip the call to bch_journal_meta(), and allow the registration to proceed, but fail it unconditionally in the end so the device isn't used with data safety / crash risk (or force an automatic unregister + register again w/ bch_journal_meta(), and disable the sysfs tunable). This would help with the full journal, and allow a sysadmin to perform the workaround without kernel rebuild and reboots. Thanks! -- Mauricio Faria de Oliveira