Re: [PATCH v5 4/8] md/r5cache: write part of r5cache

NeilBrown <neilb@xxxxxxxx> · Wed, 19 Oct 2016 11:53:21 +1100

On Fri, Oct 14 2016, Song Liu wrote:

>> On Oct 13, 2016, at 11:53 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>> 
>> On Thu, Oct 13 2016, Song Liu wrote:
>>> 
>>> For RMW, the code allocates an extra page for each data block
>>> being updated.  This is stored in r5dev->page and the old data
>>> is read into it.  Then the prexor calculation subtracts ->page
>>> from the parity block, and the reconstruct calculation adds the
>>> ->orig_page data back into the parity block.
>> 
>> What happens if the alloc_page() fails?
>
> That will be tough, but solvable.. We can
>     read old data to page
>     do prexor 
>     read new data from journal device to page
>     do xor 
>     do the rest of the work. 
>
> Or we can force the code to rcw, which does not need extra page. 
> But rcw, does not always work in degraded mode. So, this is a good 
> reason not to do write-back in degraded mode...

Prohibiting write-back in degraded mode would not be enough to ensure
that you can always use rcw.  The array can become degraded after you
make the decision to use caching, and before to need to read old data
for rmw.

I would suggest a small (2 entry?) mempool where each entry in the
mempool holds enough pages to complete an rmw.  Only use the mempool if
an alloc_page() fails.

>> 
>>> +
>>> +void r5c_handle_cached_data_endio(struct r5conf *conf,
>>> +	  struct stripe_head *sh, int disks, struct bio_list *return_bi)
>>> +{
>>> +	int i;
>>> +
>>> +	for (i = sh->disks; i--; ) {
>>> +		if (test_bit(R5_InCache, &sh->dev[i].flags) &&
>>> +		    sh->dev[i].written) {
>> 
>> Is it possible for R5_InCache to be set, but 'written' to be NULL ???
>
> Yes, it is possible. A stripe may go through "write data to journal, return IO" multiple
> times before parity calculation. When it comes here the second time, dev written in the 
> first time will have R5_InCache set, but its written will be NULL. 

OK, that makes sense.
So is it possible for 'written' to be set, but R5_InCache to be clear?
i.e. do we really need to test R5_InCache here?

>>> 
>>> static void r5l_io_run_stripes(struct r5l_io_unit *io)
>>> @@ -483,7 +566,8 @@ static int r5l_log_stripe(struct r5l_log *log, struct stripe_head *sh,
>>> 	io = log->current_io;
>>> 
>>> 	for (i = 0; i < sh->disks; i++) {
>>> -		if (!test_bit(R5_Wantwrite, &sh->dev[i].flags))
>>> +		if (!test_bit(R5_Wantwrite, &sh->dev[i].flags) &&
>>> +		    !test_bit(R5_Wantcache, &sh->dev[i].flags))
>>> 			continue;
>> 
>> If changed R5_Wantcache to R5_Wantjournal, and always set it on blocks
>> that were destined for the journal, then this would just be
>> 
>> 		if (!test_bit(R5_Wantjournal, &sh->dev[i].flags))
>> 
>> which would make lots of sense...  Just a thought.
>
> We set R5_Wantwrite in multiple places. If we simplify the code here, we will need to make
> those places aware of journal. I guess that is not ideal either? 

Maybe...
We have so many state flags that I like to be very cautious about adding
more, and to make sure they have a very well defined meaning that
doesn't overlap with other flags too much.
The above code suggests that Wantwrite and Wantcache overlap to some
extent.

Could we discard Wantcache and just use Wantwrite combined with InCache?
Wantwrite means that the block needed to be written to the RAID.
If InCache isn't set, it also needs to be written to the cache (if the
cache is being used).
Then the above code would be
   if (!test_bit(R5_Wantwrite) || test_bit(R5_InCache))
      continue;

which means "if we don't want to write this, or if it is already in the
cache, then nothing to do here".

Maybe.

>
>  
>> 
>>> }
>>> 
>>> -static void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
>>> +void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
>> 
>> Why are you removing the 'static' here?  You don't call it from any
>> other file.
>
> In next patch, it is called in raid.c.

So remove 'static' in the next patch please.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature