Re: [PATCH 0/5] a caching layer for raid 5/6

Shaohua Li <shli@xxxxxx> · Tue, 12 May 2015 08:23:50 -0700

On Tue, May 12, 2015 at 12:18:54AM -0700, Christoph Hellwig wrote:
> On Mon, May 11, 2015 at 09:03:51AM -0700, Shaohua Li wrote:
> > >   - What is the reason for retry_bio_list?  If a driver returns an
> > >     I/O error to the higher levels it already has retried and came
> > >     to the conclusion this is a permanent error.
> > 
> > The retry_bio_list is to handle io to cache disk. If IO to cache disk
> > has error, it's not a permanent error here. The cache disk is a cache,
> > We can still dispatch the IO to its final destination, the raid disks.
> 
> How does this work in practice?  We've filled our cache disk with
> dirty data, and it now returns non-correctable write errors.  At this
> point we had claimed to caller that data is on stable disk, but our
> cache disk is toast now.  Is it really a good idea to now start a large
> window where we do not actually have the cache data on stable storage
> we can get back at but pretent business as usual?
> 
> IMHO the only sane way is to shut down the array when write to the cache
> disk fail.  Hopefully the disk will still allow reading from it.  Note
> that to be on the safe side you'll need a mirrored cache disk anyway.

We have a memory pool here. All data which aren't flushed to raid disks
are in the pool. So if there is io error in cache disk, we still can
flush the data from memory pool to raid disks. The pool is a limited
resource, so can reduce IO aggregation effect though, but we flush full
stripe data almost immediately to raid disks, which can mitigate a
little. This will make the caching layer like hardware raid card very
much.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html