On Mon, May 11, 2015 at 09:03:51AM -0700, Shaohua Li wrote: > > - What is the reason for retry_bio_list? If a driver returns an > > I/O error to the higher levels it already has retried and came > > to the conclusion this is a permanent error. > > The retry_bio_list is to handle io to cache disk. If IO to cache disk > has error, it's not a permanent error here. The cache disk is a cache, > We can still dispatch the IO to its final destination, the raid disks. How does this work in practice? We've filled our cache disk with dirty data, and it now returns non-correctable write errors. At this point we had claimed to caller that data is on stable disk, but our cache disk is toast now. Is it really a good idea to now start a large window where we do not actually have the cache data on stable storage we can get back at but pretent business as usual? IMHO the only sane way is to shut down the array when write to the cache disk fail. Hopefully the disk will still allow reading from it. Note that to be on the safe side you'll need a mirrored cache disk anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html