On 01/03/2018 03:03 PM, Coly Li wrote: > When there are too many I/O errors on cache device, current bcache code > will retire the whole cache set, and detach all bcache devices. But the > detached bcache devices are not stopped, which is problematic when bcache > is in writeback mode. > > If the retired cache set has dirty data of backing devices, continue > writing to bcache device will write to backing device directly. If the > LBA of write request has a dirty version cached on cache device, next time > when the cache device is re-registered and backing device re-attached to > it again, the stale dirty data on cache device will be written to backing > device, and overwrite latest directly written data. This situation causes > a quite data corruption. > > This patch checkes whether cache_set->io_disable is true in > __cache_set_unregister(). If cache_set->io_disable is true, it means cache > set is unregistering by too many I/O errors, then all attached bcache > devices will be stopped as well. If cache_set->io_disable is not true, it > means __cache_set_unregister() is triggered by writing 1 to sysfs file > /sys/fs/bcache/<UUID>/bcache/stop. This is an exception because users do > it explicitly, this patch keeps existing behavior and does not stop any > bcache device. > > Even the failed cache device has no dirty data, stopping bcache device is > still a desired behavior by many Ceph and data base users. Then their > application will report I/O errors due to disappeared bcache device, and > operation people will know the cache device is broken or disconnected. > > Signed-off-by: Coly Li <colyli@xxxxxxx> > --- > drivers/md/bcache/super.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index 49d6fedf89c3..20a7a6959506 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -1458,6 +1458,14 @@ static void __cache_set_unregister(struct closure *cl) > dc = container_of(c->devices[i], > struct cached_dev, disk); > bch_cached_dev_detach(dc); > + /* > + * If we come here by too many I/O errors, > + * bcache device should be stopped too, to > + * keep data consistency on cache and > + * backing devices. > + */ > + if (c->io_disable) > + bcache_device_stop(c->devices[i]); > } else { > bcache_device_stop(c->devices[i]); > } > Reviewed-by: Hannes Reinecke <hare@xxxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)