On 04/01/2018 1:59 AM, tang.junhui@xxxxxxxxxx wrote: > From: Tang Junhui <tang.junhui@xxxxxxxxxx> > > Hello Coly, > > This patch is great! > > One tips, > Could you replace the c->io_disable with the already exsited c->flags? > So we can just need to add a new macro such as CACHE_SET_IO_DISABLE. > Hi Junhui, Your suggestion is cool! I will do it in v2 set. Thanks. Coly Li >> When too many I/Os failed on cache device, bch_cache_set_error() is called >> in the error handling code path to retire whole problematic cache set. If >> new I/O requests continue to come and take refcount dc->count, the cache >> set won't be retired immediately, this is a problem. >> >> Further more, there are several kernel thread and self-armed kernel work >> may still running after bch_cache_set_error() is called. It needs to wait >> quite a while for them to stop, or they won't stop at all. They also >> prevent the cache set from being retired. >> >> The solution in this patch is, to add per cache set flag to disable I/O >> request on this cache and all attached backing devices. Then new coming I/O >> requests can be rejected in *_make_request() before taking refcount, kernel >> threads and self-armed kernel worker can stop very fast when io_disable is >> true. >> >> Because bcache also do internal I/Os for writeback, garbage collection, >> bucket allocation, journaling, this kind of I/O should be disabled after >> bch_cache_set_error() is called. So closure_bio_submit() is modified to >> check whether cache_set->io_disable is true. If cache_set->io_disable is >> true, closure_bio_submit() will set bio->bi_status to BLK_STS_IOERR and >> return, generic_make_request() won't be called. >> >> A sysfs interface is also added for cache_set->io_disable, to read and set >> io_disable value for debugging. It is helpful to trigger more corner case >> issues for failed cache device. >> >> Signed-off-by: Coly Li <colyli@xxxxxxx> [snip]