Re: bcache-register hang after reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 11, 2021 at 05:54:18PM -0300, Mauricio Oliveira wrote:
> Hi Kent,
> 
> On Wed, Nov 10, 2021 at 5:13 PM Kent Overstreet
> <kent.overstreet@xxxxxxxxx> wrote:
> > Your journal is completely full, so persisting the new btree root while doing
> > journal replay is hanging.
> >
> > There isn't a _good_ solution for this journal deadlock in bcache (it's fixed in
> > bcachefs), but there is a hack:
> >
> > edit drivers/md/bcache/btree.c line 2493
> >
> > delete the call to bch_journal_meta(), and build a new kernel. Once you've
> > gotten it to register, do a clean shutdown and then go back to a stock kernel.
> >
> > Running the kernel with that call deleted won't be safe if you crash, but it'll
> > get you going again.
> 
> Thanks for the clarification and suggestions.
> 
> Would it be OK to implement that workaround if requested by a sysadmin ?
> (say, to ack the data safety / crash risk)
> 
> Right now the issue is known, reproduces with v5.15, has no good solution,
> remains after reboot, prints hung task warnings continuously, and prevents
> using the device at all; and this workaround requires kernel dev/build skills.
> 
> Since its effects seem bad enough, it would seem fair enough to provide a
> way out even if it's not a _good_ one.
> 
> Say, we could try and detect the journal full during journal replay, and handle
> it by failing the device registration. This would unblock the tasks, and provide
> a more intuitive error message. (maybe leading to the next paragraph.)
> 
> We could also add a sysfs tunable to skip the call to bch_journal_meta(),
> and allow the registration to proceed, but fail it unconditionally in the end
> so the device isn't used with data safety / crash risk
> (or force an automatic unregister + register again w/ bch_journal_meta(),
> and disable the sysfs tunable).
> 
> This would help with the full journal, and allow a sysadmin to perform the
> workaround without kernel rebuild and reboots.

I think the best solution might be to change bch_btree_set_root() to check if
we're in journal replay, and if we are, make the call to bch_journal_meta()
nonblocking - pass it NULL instead of a closure.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux