Re: [PATCH V4 00/13] MD: a caching layer for raid5/6

Shaohua Li <shli@xxxxxx> · Fri, 10 Jul 2015 10:48:45 -0700

On Fri, Jul 10, 2015 at 04:42:09PM +1000, NeilBrown wrote:
> On Thu, 9 Jul 2015 22:18:15 -0700 Shaohua Li <shli@xxxxxx> wrote:
> 
> > On Fri, Jul 10, 2015 at 03:10:44PM +1000, NeilBrown wrote:
> > > On Thu, 9 Jul 2015 21:52:43 -0700 Shaohua Li <shli@xxxxxx> wrote:
> > > 
> > > > On Fri, Jul 10, 2015 at 02:36:56PM +1000, NeilBrown wrote:
> > > > > On Thu, 9 Jul 2015 21:08:49 -0700 Shaohua Li <shli@xxxxxx> wrote:
> > > > > 
> > > 
> > > > > There is also the issue of what action commits a previous transaction.
> > > > > I'm not sure what you had.  I'm suggesting that each metadata block
> > > > > commits previous transactions.  Is that a close-enough match to what
> > > > > you had?
> > > > 
> > > > What did you mean about a transaction? In my implementation, metadata
> > > > block and followed stripe data/parity consist of an io unit. io units can
> > > > be finished out of order. but if io unit has flush request (the data has
> > > > flush/flush bio or metadata is a flush block), the io unit can only
> > > > start after all previous io units and disk cache flush finish. Such io
> > > > unit is strictly ordered. The log patch describes this behavior. Does it
> > > > match?
> > > 
> > > Yes, a "transaction" is an "io unit".  The flushing is the same.
> > > I just couldn't remember how, when reading the log on restart, you
> > > determined if a given "io unit" was reliably consistent, or whether it
> > > should be ignored (having possibly only partially been written).
> > 
> > The metadata block has a checksum for data of the block. data/parity has
> > checksum stored in metadata block. This way we can know if metadata and
> > data is consistent.
> > 
> 
> OK .. though I'm not totally sold on the value of checksums.  When a
> checksum doesn't match, that means something.  When a checksum does
> match, it could just be a co-incidence.
> I'd rather have a process that made checksums unnecessary, and only use
> the checksums as a double-check.

We could do something like: write metadata/data, wait, write another
metadata. the second metadata indicates the first is in disk. But this
can impact performance very much. I think checksum should be fine. It
might be just a coninsidence, but the rate should extremely low. jbd2 is
using checksum too now.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html