Re: [RFC] raid5: add a log device to fix raid5/6 write hole issue

Shaohua Li <shli@xxxxxx> · Wed, 8 Apr 2015 23:15:47 -0700

On Thu, Apr 09, 2015 at 03:04:59PM +1000, NeilBrown wrote:
> On Wed, 8 Apr 2015 17:43:11 -0700 Shaohua Li <shli@xxxxxx> wrote:
> 
> > Hi,
> > This is what I'm working on now, and hopefully had the basic code
> > running next week. The new design will do cache and fix the write hole
> > issue too. Before I post the code out, I'd like to check if the design
> > has obvious issues.
> 
> I can't say I'm excited about it....
> 
> You still haven't explained why you would ever want to read data from the
> "cache"?  Why not just keep everything in the stripe-cache until it is safe
> in the RAID.   I asked before and you said:
> 
> >> I'm not enthusiastic to use stripe cache though, we can't keep all data
> >> in stripe cache. What we really need is an index.
> 
> which is hardly an answer.  Why cannot you keep all the data in the stripe
> cache?  How much data is there? How much memory can you afford to dedicate?
> 
> You must have some very long sustained bursts of writes which are much faster
> than the RAID can accept in order to not be able to keep everything in memory.
> 
> 
> Your cache layout seems very rigid.  I would much rather a layout that was
> very general and flexible.  If you want to always allocate a chunk at a time
> then fine, but don't force that on the cache layout.
> 
> The log really should be very simple.  A block describing what comes next,
> then lots of data/parity.  Then another block and more data etc etc.
> Each metadata  block points to the next one.
> If you need an index of the cache, you keep that in memory.  On restart, you
> read all of the metadata blocks and  built up the index.
> 
> I think that space in the log should be reclaimed in exactly the order that
> it is written, so the active part of the log is contiguous.   Obviously
> individual blocks become inactive in arbitrary order as they are written to
> the RAID, but each extent of the log becomes free in order.
> If you want that to happen out of order, you would need to present a very
> good reason.

I came to the same idea when I'm thinking about a caching layer, but the
memory size is the main blocking issue. If the solution requires a large
amount of extra memory, it's not cost effective, so a hard sell to
replace hardware raid with software raid. The design completely depends
on if we can store all data in memory. I don't have an anwser yet how
much memory we should use to make the aggregation efficient. Guess only
number can talk. I'll try to collect some data and get back to you.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html