> On Oct 18, 2016, at 5:53 PM, NeilBrown <neilb@xxxxxxxx> wrote: > > On Fri, Oct 14 2016, Song Liu wrote: > >>> On Oct 1st of the work. >> >> Or we can force the code to rcw, which does not need extra page. >> But rcw, does not always work in degraded mode. So, this is a good >> reason not to do write-back in degraded mode... > > Prohibiting write-back in degraded mode would not be enough to ensure > that you can always use rcw. The array can become degraded after you > make the decision to use caching, and before to need to read old data > for rmw. > > I would suggest a small (2 entry?) mempool where each entry in the > mempool holds enough pages to complete an rmw. Only use the mempool if > an alloc_page() fails. > This is a great idea. I will try it. >>> >>>> + >>>> +void r5c_handle_cached_data_endio(struct r5conf *conf, >>>> + struct stripe_head *sh, int disks, struct bio_list *return_bi) >>>> +{ >>>> + int i; >>>> + >>>> + for (i = sh->disks; i--; ) { >>>> + if (test_bit(R5_InCache, &sh->dev[i].flags) && >>>> + sh->dev[i].written) { >>> >>> Is it possible for R5_InCache to be set, but 'written' to be NULL ??? >> >> Yes, it is possible. A stripe may go through "write data to journal, return IO" multiple >> times before parity calculation. When it comes here the second time, dev written in the >> first time will have R5_InCache set, but its written will be NULL. > > OK, that makes sense. > So is it possible for 'written' to be set, but R5_InCache to be clear? > i.e. do we really need to test R5_InCache here? In current implementation, there is only one log_io per sh, so we should be fine just check 'written'. Let me double check. > >>>> >>>> static void r5l_io_run_stripes(struct r5l_io_unit *io) >>>> @@ -483,7 +566,8 @@ static int r5l_log_stripe(struct r5l_log *log, struct stripe_head *sh, >>>> io = log->current_io; >>>> >>>> for (i = 0; i < sh->disks; i++) { >>>> - if (!test_bit(R5_Wantwrite, &sh->dev[i].flags)) >>>> + if (!test_bit(R5_Wantwrite, &sh->dev[i].flags) && >>>> + !test_bit(R5_Wantcache, &sh->dev[i].flags)) >>>> continue; >>> >>> If changed R5_Wantcache to R5_Wantjournal, and always set it on blocks >>> that were destined for the journal, then this would just be >>> >>> if (!test_bit(R5_Wantjournal, &sh->dev[i].flags)) >>> >>> which would make lots of sense... Just a thought. >> >> We set R5_Wantwrite in multiple places. If we simplify the code here, we will need to make >> those places aware of journal. I guess that is not ideal either? > > Maybe... > We have so many state flags that I like to be very cautious about adding > more, and to make sure they have a very well defined meaning that > doesn't overlap with other flags too much. > The above code suggests that Wantwrite and Wantcache overlap to some > extent. > > Could we discard Wantcache and just use Wantwrite combined with InCache? > Wantwrite means that the block needed to be written to the RAID. > If InCache isn't set, it also needs to be written to the cache (if the > cache is being used). > Then the above code would be > if (!test_bit(R5_Wantwrite) || test_bit(R5_InCache)) > continue; > > which means "if we don't want to write this, or if it is already in the > cache, then nothing to do here". > > Maybe. I am not sure whether we can totally remove Wantcache. Let me try it. Thanks, Song-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html