Hey Kent and Coly, It turns out that, at least for the disk image that reproduces the issue, the closure from bch_btree_set_root() to bch_journal_meta() doesn't make a difference; the stall is in bch_journal() -> journal_wait_for_write(). So the previous suggestion to skip bch_journal_meta() altogether works, to get things going.. of course, checking for journal replay/full case. What do you think of this patch? It simply checks the conditions in run_cache_set() for bch_journal_replay(). (it starts w/ unlikely(!CACHE_SET_RUNNING) to quickly get to the usual case, and apparently has an extra strict check for !gc_thread, just in case. And it is journal_full() only, as the !journal_full() case in journal_wait_ for_write() seems to be handled via another function per the comment.) This works w/ the disk image here. Thanks! Mauricio diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 72abe5cf4b12..bedeffc3ae28 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -2477,9 +2477,6 @@ int bch_btree_insert(struct cache_set *c, struct keylist *keys, void bch_btree_set_root(struct btree *b) { unsigned int i; - struct closure cl; - - closure_init_stack(&cl); trace_bcache_btree_set_root(b); @@ -2494,8 +2491,18 @@ void bch_btree_set_root(struct btree *b) b->c->root = b; - bch_journal_meta(b->c, &cl); - closure_sync(&cl); + /* Don't journal during replay if journal is full (prevents deadlock) */ + if (unlikely(!test_bit(CACHE_SET_RUNNING, &b->c->flags)) && + CACHE_SYNC(&b->c->cache->sb) && b->c->gc_thread == NULL && + journal_full(&b->c->journal)) { + pr_info("Not journaling new root (replay with full journal)\n"); + } else { + struct closure cl; + + closure_init_stack(&cl); + bch_journal_meta(b->c, &cl); + closure_sync(&cl); + } } /* Map across nodes or keys */