Re: [RFC PATCH v2 05/16] bcache: reserve space for journal_meta() in run time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/19/19 6:04 PM, Coly Li wrote:
Another journal deadlock of bcache jouranling can happen in normal
bcache runtime. It is very rare to happen but there are people report
bkey insert work queue blocked which caused by such deadlock.

This is how such jouranling deadlock in runtime happens,
- Journal space is totally full and no free space to reclaim, jouranling
   tasks waiting for space to write in journal_wait_for_write().
- In order to have free journal space, btree_flush_write() is called to
   flush earlest journaled in-memory btree key into btree node. Then all
   journaled bkey in early used journal buckets are flushed to on-disk
   btree, this journal bucket can be reclaimed for new coming jouranl
   request.
- But if the earlest jouranled bkey causes a btree node split during
   insert it into btree node, finally journal_meta() will be called to
   journal btree root (and other information) into the journal space.
- Unfortunately the journal space is full, and the jouranl entries has
   to be flushed in linear turn. So bch_journal_meta() from bkey insert
   is blocked too.
Then jouranling deadlock during bcache run time happens.

A method to fix such deadlock is to reserve some journal space too. The
reserved space can only be used when,
- Current journal bucket is the last journal bucket which has available
   space to write into.
- When calling bch_journal(), current jset is empty and there is no key
   in the inserting key list. This means the journal request if from
   bch_journal_meta() and no non-reserved space can be used.

Then if such journaling request is from bch_journal_meta() of inserting
the earlest journaled bkey back into btree, the deadlock condition won't
happen any more because the reserved space can be used for such
scenario.

Since there are already 6 sectors reserved for journal replay, here we
reserve 7 sectors for runtime meta journal from btree split caused by
flushing journal entries back to btree node. Depends on block size from
1 sector to 4KB, the reserved space can serve for form 7 to 2 journal
blocks. Indeed only one journal block reserved for such journal deadlock
scenario is enough, 2 continuous btree splits cause by two adjoin bkey
flushing from journal is very very rare to happen. So reserve 7 sectors
should works.

Another reason for reserving 7 sectors is, there are already 6 sectors
reserved fo journal repley, so in total there are 13 sectors reserved in
last available journal bucket. 13 sectors won't be a proper bucket size,
so we don't need to add more code to handle journal.blocks_free
initialization for whole reserved jouranl bucket. Even such code logic
is simple, less code is better in my humble opinion.

Again, if in future the reserved space turns out to be not enough, let's
extend it then.

Signed-off-by: Coly Li <colyli@xxxxxxx>
---
  drivers/md/bcache/journal.c | 89 +++++++++++++++++++++++++++++++++------------
  drivers/md/bcache/journal.h |  1 +
  2 files changed, 66 insertions(+), 24 deletions(-)

Reviewed-by: Hannes Reinecke <hare@xxxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@xxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux