Re: md road-map: 2011

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 16 Feb 2011 14:37:26 -0500 Phil Turmel <philip@xxxxxxxxxx> wrote:

> Hi Neil,
> 
> On 02/16/2011 05:27 AM, NeilBrown wrote:
> > 
> > I all,
> >  I wrote this today and posted it at
> > http://neil.brown.name/blog/20110216044002
> > 
> > I thought it might be worth posting it here too...
> > 
> > NeilBrown
> > 
> > 
> > -------------------------
> > 
> > 
> > It is about 2 years since I last published a road-map[1] for md/raid
> > so I thought it was time for another one.  Unfortunately quite a few
> > things on the previous list remain undone, but there has been some
> > progress.
> > 
> > I think one of the problems with some to-do lists is that they aren't
> > detailed enough.  High-level design, low level design, implementation,
> > and testing are all very different sorts of tasks that seem to require
> > different styles of thinking and so are best done separately.  As
> > writing up a road-map is a high-level design task it makes sense to do
> > the full high-level design at that point so that the tasks are
> > detailed enough to be addressed individually with little reference to
> > the other tasks in the list (except what is explicit in the road map).
> > 
> > A particular need I am finding for this road map is to make explicit
> > the required ordering and interdependence of certain tasks.  Hopefully
> > that will make it easier to address them in an appropriate order, and
> > mean that I waste less time saying "this is too hard, I might go read
> > some email instead".
> > 
> > So the following is a detailed road-map for md raid for the coming
> > months.
> > 
> > [1] http://neil.brown.name/blog/20090129234603
> > 
> > Bad Block Log
> > -------------
> [trim /]
> > Bitmap of non-sync regions.
> > ---------------------------
> [trim /]
> 
> It occurred to me that if you go to the trouble (and space and performance)
> to create and maintain metadata for lists of bad blocks, and separate
> metadata for sync status aka "trim", or hot-replace status, or reshape-status,
> or whatever features are dreamt up later, why not create an infrastructure to
> carry all of it efficiently?
> 
> David Brown suggested a multi-level metadata structure.  I concur, but somewhat
> more generic:
> 	Level 1:  Coarse bitmap, set bit indicates 'look at level 2'
> 	Level 2:  Fine bitmap, set bit indicates 'look at level 3'
> 	Level 3:  Extent list, with starting block, length, and feature payload
> 
> The bitmap levels are purely for hot-path performance.
> 
> As an option, it should be possible to spread the detailed metadata through the
> data area, possibly in chunk-sized areas spread out at some user-defined
> interval.  "meta-span", perhaps.  Then resizing partitions that compose an
> array would be less likely to bump up against metadata size limits.  The coarse
> bitmap should stay near the superblock, of course.

This is starting to sound a lot more like a filesystem than a RAID system.

I really don't want there to be so much metadata that I am tempted to spread
it out among the data.  I think that implies too much complexity.

Maybe that is a good place to draw the line:  If some metadata doesn't fit
easily at the start of end of the devices, it has no place in RAID - you
should add it to a filesystem instead.


> 
> Personally, I'd like to see the bad-block feature actually perform block
> remapping, much like hard drives themselves do, but with the option to unmap the
> block if a later write succeeds.  Using one retry per array restart as you
> described makes a lot of sense.  In any case, remapping would retain redundancy
> where applicable short of full drive failure or remap overflow.

If the hard drives already do this, why should md try to do it as well??
If a hard drive has had some many write errors that it has used up all of its
spare space, then it is long past time to replace it.


> 
> My $0.02, of course.

Here in .au, the smallest legal tender is $0.05 - but thanks anyway :-)

NeilBrown

> 
> Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux