Re: [RFC PATCH v2 06/16] bcache: add failure check to run_cache_set() for journal replay

Coly Li <colyli@xxxxxxx> · Tue, 23 Apr 2019 15:09:53 +0800

On 2019/4/23 3:02 下午, Hannes Reinecke wrote:
> On 4/19/19 6:04 PM, Coly Li wrote:
>> Currently run_cache_set() has no return value, if there is failure in
>> bch_journal_replay(), the caller of run_cache_set() has no idea about
>> such failure and just continue to execute following code after
>> run_cache_set().  The internal failure is triggered inside
>> bch_journal_replay() and being handled in async way. This behavior is
>> inefficient, while failure handling inside bch_journal_replay(), cache
>> register code is still running to start the cache set. Registering and
>> unregistering code running as same time may introduce some rare race
>> condition, and make the code to be more hard to be understood.
>>
>> This patch adds return value to run_cache_set(), and returns -EIO if
>> bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
>> such failure and stop registering code flow immedidately inside
>> register_cache_set().
>>
>> If journal replay fails, run_cache_set() can report error immediately
>> to register_cache_set(). This patch makes the failure handling for
>> bch_journal_replay() be in synchronized way, easier to understand and
>> debug, and avoid poetential race condition for register-and-unregister
>> in same time.
>>
>> Signed-off-by: Coly Li <colyli@xxxxxxx>
>> ---
>>   drivers/md/bcache/super.c | 17 ++++++++++++-----
>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index a697a3a923cd..036bffad0bfe 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -1775,7 +1775,7 @@ struct cache_set *bch_cache_set_alloc(struct
>> cache_sb *sb)
>>       return NULL;
>>   }
>>   -static void run_cache_set(struct cache_set *c)
>> +static int run_cache_set(struct cache_set *c)
>>   {
>>       const char *err = "cannot allocate memory";
>>       struct cached_dev *dc, *t;
>> @@ -1869,7 +1869,9 @@ static void run_cache_set(struct cache_set *c)
>>           if (j->version < BCACHE_JSET_VERSION_UUID)
>>               __uuid_write(c);
>>   -        bch_journal_replay(c, &journal);
>> +        err = "bcache: replay journal failed";
>> +        if (bch_journal_replay(c, &journal))
>> +            goto err;
>>       } else {
>>           pr_notice("invalidating existing data");
>>   @@ -1937,11 +1939,13 @@ static void run_cache_set(struct cache_set *c)
>>       flash_devs_run(c);
>>         set_bit(CACHE_SET_RUNNING, &c->flags);
>> -    return;
>> +    return 0;
>>   err:
>>       closure_sync(&cl);
>>       /* XXX: test this, it's broken */
>>       bch_cache_set_error(c, "%s", err);
>> +
>> +    return -EIO;
>>   }
>>   
> Shouldn't you differentiate between -EIO if the journal replay failed
> and -ENOMEM?

Hi Hannes,

Currently I just return an error code to caller to indicate something
wrong, a more specific error code should be more informative. Let me
think how to do it.

Thanks.

-- 

Coly Li