On 4/19/19 6:04 PM, Coly Li wrote:
Another journal deadlock of bcache jouranling can happen in normal bcache runtime. It is very rare to happen but there are people report bkey insert work queue blocked which caused by such deadlock. This is how such jouranling deadlock in runtime happens, - Journal space is totally full and no free space to reclaim, jouranling tasks waiting for space to write in journal_wait_for_write(). - In order to have free journal space, btree_flush_write() is called to flush earlest journaled in-memory btree key into btree node. Then all journaled bkey in early used journal buckets are flushed to on-disk btree, this journal bucket can be reclaimed for new coming jouranl request. - But if the earlest jouranled bkey causes a btree node split during insert it into btree node, finally journal_meta() will be called to journal btree root (and other information) into the journal space. - Unfortunately the journal space is full, and the jouranl entries has to be flushed in linear turn. So bch_journal_meta() from bkey insert is blocked too. Then jouranling deadlock during bcache run time happens. A method to fix such deadlock is to reserve some journal space too. The reserved space can only be used when, - Current journal bucket is the last journal bucket which has available space to write into. - When calling bch_journal(), current jset is empty and there is no key in the inserting key list. This means the journal request if from bch_journal_meta() and no non-reserved space can be used. Then if such journaling request is from bch_journal_meta() of inserting the earlest journaled bkey back into btree, the deadlock condition won't happen any more because the reserved space can be used for such scenario. Since there are already 6 sectors reserved for journal replay, here we reserve 7 sectors for runtime meta journal from btree split caused by flushing journal entries back to btree node. Depends on block size from 1 sector to 4KB, the reserved space can serve for form 7 to 2 journal blocks. Indeed only one journal block reserved for such journal deadlock scenario is enough, 2 continuous btree splits cause by two adjoin bkey flushing from journal is very very rare to happen. So reserve 7 sectors should works. Another reason for reserving 7 sectors is, there are already 6 sectors reserved fo journal repley, so in total there are 13 sectors reserved in last available journal bucket. 13 sectors won't be a proper bucket size, so we don't need to add more code to handle journal.blocks_free initialization for whole reserved jouranl bucket. Even such code logic is simple, less code is better in my humble opinion. Again, if in future the reserved space turns out to be not enough, let's extend it then. Signed-off-by: Coly Li <colyli@xxxxxxx> --- drivers/md/bcache/journal.c | 89 +++++++++++++++++++++++++++++++++------------ drivers/md/bcache/journal.h | 1 + 2 files changed, 66 insertions(+), 24 deletions(-)
Reviewed-by: Hannes Reinecke <hare@xxxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)