On 02/16/2011 04:44 PM, NeilBrown wrote: [trim /] > On Wed, 16 Feb 2011 14:37:26 -0500 Phil Turmel <philip@xxxxxxxxxx> wrote: >> It occurred to me that if you go to the trouble (and space and performance) >> to create and maintain metadata for lists of bad blocks, and separate >> metadata for sync status aka "trim", or hot-replace status, or reshape-status, >> or whatever features are dreamt up later, why not create an infrastructure to >> carry all of it efficiently? >> >> David Brown suggested a multi-level metadata structure. I concur, but somewhat >> more generic: >> Level 1: Coarse bitmap, set bit indicates 'look at level 2' >> Level 2: Fine bitmap, set bit indicates 'look at level 3' >> Level 3: Extent list, with starting block, length, and feature payload >> >> The bitmap levels are purely for hot-path performance. >> >> As an option, it should be possible to spread the detailed metadata through the >> data area, possibly in chunk-sized areas spread out at some user-defined >> interval. "meta-span", perhaps. Then resizing partitions that compose an >> array would be less likely to bump up against metadata size limits. The coarse >> bitmap should stay near the superblock, of course. > > This is starting to sound a lot more like a filesystem than a RAID system. Heh. But if you are going to start adding block and/or block-extent metadata for a variety of features, common code and storage for it should be an all-around win. > I really don't want there to be so much metadata that I am tempted to spread > it out among the data. I think that implies too much complexity. It would be complex, yes. Same math as computing block locations within raid 5 stripes, though. > Maybe that is a good place to draw the line: If some metadata doesn't fit > easily at the start of end of the devices, it has no place in RAID - you > should add it to a filesystem instead. I think that's arbitrary, but its moot until someone tries to implement it. >> Personally, I'd like to see the bad-block feature actually perform block >> remapping, much like hard drives themselves do, but with the option to unmap the >> block if a later write succeeds. Using one retry per array restart as you >> described makes a lot of sense. In any case, remapping would retain redundancy >> where applicable short of full drive failure or remap overflow. > > If the hard drives already do this, why should md try to do it as well?? > If a hard drive has had some many write errors that it has used up all of its > spare space, then it is long past time to replace it. True enough. >> My $0.02, of course. > > Here in .au, the smallest legal tender is $0.05 - but thanks anyway :-) I guess the offer of "a penny for your thoughts" doesn't work down under ;) Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html