Re: [PATCH v5 4/8] md/r5cache: write part of r5cache

Song Liu <songliubraving@xxxxxx> · Fri, 21 Oct 2016 20:42:08 +0000

> On Oct 18, 2016, at 5:53 PM, NeilBrown <neilb@xxxxxxxx> wrote:
> 
> On Fri, Oct 14 2016, Song Liu wrote:
> 
>>> On Oct 1st of the work. 
>> 
>> Or we can force the code to rcw, which does not need extra page. 
>> But rcw, does not always work in degraded mode. So, this is a good 
>> reason not to do write-back in degraded mode...
> 
> Prohibiting write-back in degraded mode would not be enough to ensure
> that you can always use rcw.  The array can become degraded after you
> make the decision to use caching, and before to need to read old data
> for rmw.
> 
> I would suggest a small (2 entry?) mempool where each entry in the
> mempool holds enough pages to complete an rmw.  Only use the mempool if
> an alloc_page() fails.
> 
This is a great idea. I will try it. 

>>> 
>>>> +
>>>> +void r5c_handle_cached_data_endio(struct r5conf *conf,
>>>> +	  struct stripe_head *sh, int disks, struct bio_list *return_bi)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for (i = sh->disks; i--; ) {
>>>> +		if (test_bit(R5_InCache, &sh->dev[i].flags) &&
>>>> +		    sh->dev[i].written) {
>>> 
>>> Is it possible for R5_InCache to be set, but 'written' to be NULL ???
>> 
>> Yes, it is possible. A stripe may go through "write data to journal, return IO" multiple
>> times before parity calculation. When it comes here the second time, dev written in the 
>> first time will have R5_InCache set, but its written will be NULL. 
> 
> OK, that makes sense.
> So is it possible for 'written' to be set, but R5_InCache to be clear?
> i.e. do we really need to test R5_InCache here?

In current implementation, there is only one log_io per sh, so we should be fine just check 
'written'. Let me double check. 

> 
>>>> 
>>>> static void r5l_io_run_stripes(struct r5l_io_unit *io)
>>>> @@ -483,7 +566,8 @@ static int r5l_log_stripe(struct r5l_log *log, struct stripe_head *sh,
>>>> 	io = log->current_io;
>>>> 
>>>> 	for (i = 0; i < sh->disks; i++) {
>>>> -		if (!test_bit(R5_Wantwrite, &sh->dev[i].flags))
>>>> +		if (!test_bit(R5_Wantwrite, &sh->dev[i].flags) &&
>>>> +		    !test_bit(R5_Wantcache, &sh->dev[i].flags))
>>>> 			continue;
>>> 
>>> If changed R5_Wantcache to R5_Wantjournal, and always set it on blocks
>>> that were destined for the journal, then this would just be
>>> 
>>> 		if (!test_bit(R5_Wantjournal, &sh->dev[i].flags))
>>> 
>>> which would make lots of sense...  Just a thought.
>> 
>> We set R5_Wantwrite in multiple places. If we simplify the code here, we will need to make
>> those places aware of journal. I guess that is not ideal either? 
> 
> Maybe...
> We have so many state flags that I like to be very cautious about adding
> more, and to make sure they have a very well defined meaning that
> doesn't overlap with other flags too much.
> The above code suggests that Wantwrite and Wantcache overlap to some
> extent.
> 
> Could we discard Wantcache and just use Wantwrite combined with InCache?
> Wantwrite means that the block needed to be written to the RAID.
> If InCache isn't set, it also needs to be written to the cache (if the
> cache is being used).
> Then the above code would be
>   if (!test_bit(R5_Wantwrite) || test_bit(R5_InCache))
>      continue;
> 
> which means "if we don't want to write this, or if it is already in the
> cache, then nothing to do here".
> 
> Maybe.

I am not sure whether we can totally remove Wantcache. Let me try it. 

Thanks,
Song--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html