On 2019/4/23 3:02 下午, Hannes Reinecke wrote: > On 4/19/19 6:04 PM, Coly Li wrote: >> Currently run_cache_set() has no return value, if there is failure in >> bch_journal_replay(), the caller of run_cache_set() has no idea about >> such failure and just continue to execute following code after >> run_cache_set(). The internal failure is triggered inside >> bch_journal_replay() and being handled in async way. This behavior is >> inefficient, while failure handling inside bch_journal_replay(), cache >> register code is still running to start the cache set. Registering and >> unregistering code running as same time may introduce some rare race >> condition, and make the code to be more hard to be understood. >> >> This patch adds return value to run_cache_set(), and returns -EIO if >> bch_journal_rreplay() fails. Then caller of run_cache_set() may detect >> such failure and stop registering code flow immedidately inside >> register_cache_set(). >> >> If journal replay fails, run_cache_set() can report error immediately >> to register_cache_set(). This patch makes the failure handling for >> bch_journal_replay() be in synchronized way, easier to understand and >> debug, and avoid poetential race condition for register-and-unregister >> in same time. >> >> Signed-off-by: Coly Li <colyli@xxxxxxx> >> --- >> drivers/md/bcache/super.c | 17 ++++++++++++----- >> 1 file changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c >> index a697a3a923cd..036bffad0bfe 100644 >> --- a/drivers/md/bcache/super.c >> +++ b/drivers/md/bcache/super.c >> @@ -1775,7 +1775,7 @@ struct cache_set *bch_cache_set_alloc(struct >> cache_sb *sb) >> return NULL; >> } >> -static void run_cache_set(struct cache_set *c) >> +static int run_cache_set(struct cache_set *c) >> { >> const char *err = "cannot allocate memory"; >> struct cached_dev *dc, *t; >> @@ -1869,7 +1869,9 @@ static void run_cache_set(struct cache_set *c) >> if (j->version < BCACHE_JSET_VERSION_UUID) >> __uuid_write(c); >> - bch_journal_replay(c, &journal); >> + err = "bcache: replay journal failed"; >> + if (bch_journal_replay(c, &journal)) >> + goto err; >> } else { >> pr_notice("invalidating existing data"); >> @@ -1937,11 +1939,13 @@ static void run_cache_set(struct cache_set *c) >> flash_devs_run(c); >> set_bit(CACHE_SET_RUNNING, &c->flags); >> - return; >> + return 0; >> err: >> closure_sync(&cl); >> /* XXX: test this, it's broken */ >> bch_cache_set_error(c, "%s", err); >> + >> + return -EIO; >> } >> > Shouldn't you differentiate between -EIO if the journal replay failed > and -ENOMEM? Hi Hannes, Currently I just return an error code to caller to indicate something wrong, a more specific error code should be more informative. Let me think how to do it. Thanks. -- Coly Li