Re: [PATCH V4 00/13] MD: a caching layer for raid5/6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 16, 2015 at 09:16:53AM +1000, NeilBrown wrote:
> On Wed, 15 Jul 2015 12:49:37 -0700 Shaohua Li <shli@xxxxxx> wrote:
> 
> > On Wed, Jul 15, 2015 at 02:06:41PM +1000, NeilBrown wrote:
> > > On Tue, 14 Jul 2015 20:16:17 -0700 Shaohua Li <shli@xxxxxx> wrote:
> > > 
> 
> > > 
> > > >                                                           I don't
> > > > understand why you object adding a superblock for cache. The advantage
> > > > is it's self contained. And there is nothing about
> > > > complexity/maintaince, as we can store the most necessary fields into
> > > > the superblock.
> > > 
> > > Because there is precisely 1 number that needs to be stored in the
> > > superblock, and there seems no point having a superblock just to store
> > > one number.
> > > It isn't much extra complexity, but any extra thing is still an extra
> > > thing.
> > > Having the data section of the log device containing just a log is
> > > elegant.  Elegant is good.
> > > If we decided that keeping two copies for superblocks was a good idea
> > > (which I think it is, I just haven't created a "v1.3" layout yet), then
> > > re-using the main superblock for the head-of-log pointer would instantly
> > > give us two copies of that as well.
> > 
> > I think I need 2 fields to find log head/tail in recovery. Currently
> > cache superblock records checkpoint disk position (log tail) and
> > checkpoint sequence number, which can be used to find log head. Just
> > recording log tail doesn't work well (it might work, for example,
> > zeroing sectors before log head, so we can identify log head. But it's
> > really ugly and not efficient). I only found recovery_offset can be
> > overloaded. Do you have idea other fileds can be overloaded in MD
> > superblock?
> 
> If each metadata block contains
>   - a magic number
>   - a checksum of the block
>   - a sequence number
>   - a pointer to the "next" metadata block (which is equivalent to
>     the size of all described data)
>   - a pointer to the tail (oldest active metadata block).
> 
> Then given the address of any block in the log you can easily find the
> head:  walk the "next" pointers forward until you find a block
> that has the wrong magic or checksum or sequence or previous pointer.
> The last block that was consistent is the head.
> 
> You can then find the tail directly, and walk forward processing the
> log.
> 
> Efficiency is not really an issue.  On a clean shutdown (which should
> be the norm), the md superblock will contain a pointer to the head, and
> the "next" block after that can quickly be determined to be invalid.
> On an unclean shutdown it is expected that we need to do a bit more
> work, and skipping forward along the chain to find the head of the log
> is the least of our worries.

if superblock records 2 fileds (the log tail and the seq of log tail), metadata
block doesn't need 'a pointer to the tail (oldest active metadata block)'. The
log tail/seq pair can help us find log head easily. Adding a pointer to the
tail in every metadata block is definitionly worse than adding a filed in the
superblock.

Further, how can you handle the case when log winds. For example, initially the log is
................................
                ^ superblock points to here
then we add meta and wind
|meta n-1|meta n|meta 0|meta 1|....
                ^ superblock points to here

Next time we reload log, superblock points to a valid meta. recovery will think
this is an unclean shutdown, so we rescan the whole log disk (because all metas
are valid) and apply all the changes to raid array. this is terrible. But if
superblock stores both log tail and seq. we will find the meta 0 sequence
number doesn't match with superblock, recovery stops instantly.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux