Re: [PATCH v6 03/11] md/r5cache: State machine for raid5-cache write back mode

NeilBrown <neilb@xxxxxxxx> · Wed, 16 Nov 2016 11:17:13 +1100

On Fri, Nov 11 2016, Song Liu wrote:

> This patch adds state machine for raid5-cache. With log device, the
> raid456 array could operate in two different modes (r5c_journal_mode):
>   - write-back (R5C_MODE_WRITE_BACK)
>   - write-through (R5C_MODE_WRITE_THROUGH)
>
> Existing code of raid5-cache only has write-through mode. For write-back
> cache, it is necessary to extend the state machine.
>
> With write-back cache, every stripe could operate in two different
> modes:
>   - caching
>   - writing-out
>
> In caching mode, the stripe handles writes as:
>   - write to journal
>   - return IO
>
> In writing-out mode, the stripe behaviors as a stripe in write through
> mode R5C_MODE_WRITE_THROUGH.
>
> STRIPE_R5C_WRITE_OUT is added to sh->state to differentiate caching and
> writing-out mode.
>
> When the array is write-through, stripes also go between caching mode
> and writing-out mode.

This bothers me.  Why would a stripe *ever* be in "caching mode" (or
"caching phase") when the array is in write-through?  It doesn't seem to
make sense.

A stripe_head goes through several states/stages/phases/things.

 1/ write data blocks to journal
 2/ reading missing data blocks and compute parity
 3/ write data and parity to journal
 4/ write data and parity to RAID

When there is no log, we only perform 2 and 4
When there is a log in WRITE_THOUGH we perform 2,3,4
When there is a log in WRITE_BACK we perform 1 (maybe several times) 2 3 4.

Step 2 is initiated by calling handle_stripe_dirtying().
A new function is introduced "r5c_handle_stripe_dirtying()" which
determines if handle_stripe_dirtying() should do anything.
(It returns '0' if it shouldn't proceed, and -EAGAIN if it should,
 which seems a little strange).

If there is no log, or if STRIPE_R5C_WRITE_OUT is set, -EAGAIN is
returned.
Otherwise if in MODE_WRITE_THROUGH, STRIPE_R5C_WRITE_OUT is set and
-EAGAIN is returned.
else (in next patch) are more complex determination is made, but
-EAGAIN is only ever returns with STRIPE_R5C_WRITE_OUT set, or if log == NULL.

So STRIPE_R5C_WRITE_OUT is (sometimes) set to enter step 2, and cleared
when step 4 completes.

STRIPE_R5C_WRITE_OUT means that the 2(3)4 writeout sequence has
commenced and not yet completed.  I would like to see it *always* set
when that happens, including when log==NULL.
I would probably even rename r5c_handle_stripe_dirtying() to something
like  r5c_check_need_writeout() which would check to see if writeout was
needed, and would set STRIPE_R5C_WRITE_OUT if it was.
Then handle_stripe_dirtying() would be called iff STRIPE_R5C_WRITE_OUT
were set. (it could even be named to handle_stripe_writeout()??)

There are two actions that can be taken when where are ->towrite blocks
on a stripe.  We can enter WRITE_OUT, or they can be cached in the
journal.  Also we can enter WRITE_OUT when a stripe needs to be removed
From memory or from the journal.
This makes "writeout" and "cache" seem more like "actions" than states,
modes, or phases.  Naming is hard.

Trying to understand R5_InJournal:
 It is set (on pd_idx) when step 3 completes.
 It is only used (in this patch) in r5c_finish_stripe_write_out()
   to make sure WRITE_OUT isn't cleared until after InJournal is
 cleared.

So setting InJournal[pd_idx] marks the end of step 3 and clearing WRITE_OUT
marks the end of step 4.

I think that observation helps me understand the code a bit better.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature