Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> · Fri, 6 Dec 2019 00:04:02 +0000 (UTC)

On Tue, 3 Dec 2019, Coly Li wrote:

> On 2019/12/3 3:34 上午, Eric Wheeler wrote:
> > On Mon, 2 Dec 2019, Coly Li wrote:
> >> On 2019/12/2 6:24 下午, kungf wrote:
> >>> data may lost when in the follow scene of writeback mode:
> >>> 1. client write data1 to bcache
> >>> 2. client fdatasync
> >>> 3. bcache flush cache set and backing device
> >>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> >>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> >>> So data1 was not guaranteed in non-volatile storage,  it may lost if  power interruption 
> >>>
> >>
> >> Hi,
> >>
> >> Do you encounter such problem in real work load ? With bcache journal, I
> >> don't see the possibility of data lost with your description.
> >>
> >> Correct me if I am wrong.
> >>
> >> Coly Li
> > 
> > If this does become necessary, then we should have a sysfs or superblock 
> > flag to disable FUA for those with RAID BBUs.
> 
> Hi Eric,
> 
> I doubt it is necessary to add FUA tag for all writeback bios, it is
> unnecessary. If power failure happens after dirty data written to
> backing device and the bkey turns into clean, a following read request
> will go to cache device because the LBA can be indexed no matter it is
> dirty or clean. Unless the bkey is invalidated from the B+tree, read
> will always go to cache device firstly in writeback mode. If a power
> failure happens before the cached bkey turns from dirty to clean, just
> an extra writeback bio flushed from cache device to backing device with
> identical data. Comparing the FUA tag for all writeback bios (it will be
> really slow), the extra writeback IOs after a power failure is more
> acceptable to me.

I agree.  FWIW, I just learned about /sys/block/sdX/queue/write_cache from 
Nikos Tsironis <ntsironis@xxxxxxxxxxx>.  Thus, my flag request for a FUA 
bypass isn't necessary anyway, even if you did want an FUA there, because 
FUAs are stripped when a blockdev is set to "write back" (QUEUE_FLAG_WC).

----------------------------------------------------------------------
This happens in generic_make_request_checks():

              /*
               * Filter flush bio's early so that make_request based
               * drivers without flush support don't have to worry
               * about them.
               */
              if (op_is_flush(bio->bi_opf) &&
                  !test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
                      bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
                      if (!nr_sectors) {
                              status = BLK_STS_OK;
                              goto end_io;
                      }
              }
----------------------------------------------------------------------

-Eric

> 
> Coly Li
> 
> > 
> >>> Signed-off-by: kungf <wings.wyang@xxxxxxxxx>
> >>> ---
> >>>  drivers/md/bcache/writeback.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> >>> index 4a40f9eadeaf..e5cecb60569e 100644
> >>> --- a/drivers/md/bcache/writeback.c
> >>> +++ b/drivers/md/bcache/writeback.c
> >>> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >>>  	 */
> >>>  	if (KEY_DIRTY(&w->key)) {
> >>>  		dirty_init(w);
> >>> -		bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> >>> +		bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >>>  		io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >>>  		bio_set_dev(&io->bio, io->dc->bdev);
> >>>  		io->bio.bi_end_io	= dirty_endio;
> >>>
> >>
>