On Fri, Jul 10, 2015 at 04:42:09PM +1000, NeilBrown wrote: > On Thu, 9 Jul 2015 22:18:15 -0700 Shaohua Li <shli@xxxxxx> wrote: > > > On Fri, Jul 10, 2015 at 03:10:44PM +1000, NeilBrown wrote: > > > On Thu, 9 Jul 2015 21:52:43 -0700 Shaohua Li <shli@xxxxxx> wrote: > > > > > > > On Fri, Jul 10, 2015 at 02:36:56PM +1000, NeilBrown wrote: > > > > > On Thu, 9 Jul 2015 21:08:49 -0700 Shaohua Li <shli@xxxxxx> wrote: > > > > > > > > > > > > > There is also the issue of what action commits a previous transaction. > > > > > I'm not sure what you had. I'm suggesting that each metadata block > > > > > commits previous transactions. Is that a close-enough match to what > > > > > you had? > > > > > > > > What did you mean about a transaction? In my implementation, metadata > > > > block and followed stripe data/parity consist of an io unit. io units can > > > > be finished out of order. but if io unit has flush request (the data has > > > > flush/flush bio or metadata is a flush block), the io unit can only > > > > start after all previous io units and disk cache flush finish. Such io > > > > unit is strictly ordered. The log patch describes this behavior. Does it > > > > match? > > > > > > Yes, a "transaction" is an "io unit". The flushing is the same. > > > I just couldn't remember how, when reading the log on restart, you > > > determined if a given "io unit" was reliably consistent, or whether it > > > should be ignored (having possibly only partially been written). > > > > The metadata block has a checksum for data of the block. data/parity has > > checksum stored in metadata block. This way we can know if metadata and > > data is consistent. > > > > OK .. though I'm not totally sold on the value of checksums. When a > checksum doesn't match, that means something. When a checksum does > match, it could just be a co-incidence. > I'd rather have a process that made checksums unnecessary, and only use > the checksums as a double-check. We could do something like: write metadata/data, wait, write another metadata. the second metadata indicates the first is in disk. But this can impact performance very much. I think checksum should be fine. It might be just a coninsidence, but the rate should extremely low. jbd2 is using checksum too now. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html