Re: [RFC] raid5: add a log device to fix raid5/6 write hole issue

Dan Williams <dan.j.williams@xxxxxxxxx> · Thu, 9 Apr 2015 08:37:03 -0700



On Wed, Apr 8, 2015 at 11:15 PM, Shaohua Li <shli@xxxxxx> wrote:
> On Thu, Apr 09, 2015 at 03:04:59PM +1000, NeilBrown wrote:
>> On Wed, 8 Apr 2015 17:43:11 -0700 Shaohua Li <shli@xxxxxx> wrote:
>>
>> > Hi,
>> > This is what I'm working on now, and hopefully had the basic code
>> > running next week. The new design will do cache and fix the write hole
>> > issue too. Before I post the code out, I'd like to check if the design
>> > has obvious issues.
>>
>> I can't say I'm excited about it....
>>
>> You still haven't explained why you would ever want to read data from the
>> "cache"?  Why not just keep everything in the stripe-cache until it is safe
>> in the RAID.   I asked before and you said:
>>
>> >> I'm not enthusiastic to use stripe cache though, we can't keep all data
>> >> in stripe cache. What we really need is an index.
>>
>> which is hardly an answer.  Why cannot you keep all the data in the stripe
>> cache?  How much data is there? How much memory can you afford to dedicate?
>>
>> You must have some very long sustained bursts of writes which are much faster
>> than the RAID can accept in order to not be able to keep everything in memory.
>>
>>
>> Your cache layout seems very rigid.  I would much rather a layout that was
>> very general and flexible.  If you want to always allocate a chunk at a time
>> then fine, but don't force that on the cache layout.
>>
>> The log really should be very simple.  A block describing what comes next,
>> then lots of data/parity.  Then another block and more data etc etc.
>> Each metadata  block points to the next one.
>> If you need an index of the cache, you keep that in memory.  On restart, you
>> read all of the metadata blocks and  built up the index.
>>
>> I think that space in the log should be reclaimed in exactly the order that
>> it is written, so the active part of the log is contiguous.   Obviously
>> individual blocks become inactive in arbitrary order as they are written to
>> the RAID, but each extent of the log becomes free in order.
>> If you want that to happen out of order, you would need to present a very
>> good reason.
>
> I came to the same idea when I'm thinking about a caching layer, but the
> memory size is the main blocking issue. If the solution requires a large
> amount of extra memory, it's not cost effective, so a hard sell to
> replace hardware raid with software raid. The design completely depends
> on if we can store all data in memory. I don't have an anwser yet how
> much memory we should use to make the aggregation efficient. Guess only
> number can talk. I'll try to collect some data and get back to you.
>

Another consideration to keep in mind is persistent memory.  I'm
working on an in-kernel mechanism to claim and map pmem and a
raid-write-cache is an obvious first application.  I'll include you on
the initial submission of that capability.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html