Re: [PATCH v3 00/13] bcache: device failure handling improvement

Nix <nix@xxxxxxxxxxxxx> · Wed, 24 Jan 2018 22:23:19 +0000

On 14 Jan 2018, Coly Li said:

> Hi maintainers and folks,
>
> This patch set tries to improve bcache device failure handling, includes
> cache device and backing device failures.
>
> The basic idea to handle failed cache device is,
> - Unregister cache set
> - Detach all backing devices which are attached to this cache set
> - Stop all the detached bcache devices
> - Stop all flash only volume on the cache set
> The above process is named 'cache set retire' by me. The result of cache
> set retire is, cache set and bcache devices are all removed, following
> I/O requests will get failed immediately to notift upper layer or user
> space coce that the cache device is failed or disconnected.

This feels wrong to me. If a cache device is writethrough, the cache is
a pure optimization: having such a device fail should not lead to I/O
failures of any sort, but should only flip the cache device to 'none' so
that writes to the backing store simply don't get cached any more.

Anything else leads to a reliability reduction, since in the end cache
devices *will* fail.

-- 
NULL && (void)