Re: [PATCH RESEND] md: Make flush bios explicitely sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 24-05-17 16:22:36, Shaohua Li wrote:
> On Wed, May 24, 2017 at 01:40:13PM +0200, Jan Kara wrote:
> > Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
> > synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
> > definitions.  generic_make_request_checks() however strips REQ_FUA and
> > REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
> > write cache and thus write effectively becomes asynchronous which can
> > lead to performance regressions
> > 
> > Fix the problem by making sure all bios which are synchronous are
> > properly marked with REQ_SYNC.
> 
> DM and MD are different trees, so probably you should separate them to 2
> patches.

OK, I can do that.

> For the md part (md.c, raid5-cache.c), some placed which use REQ_FUA
> are missed, like raid5.c and raid5-ppl.c

So ops_run_io() in raid5.c only copy REQ_FUA from some internal raid5
flags. My thinking was that we want to just propagate whatever we were
instructed to do here.

The case in ppl_write_empty_header() is clearly missed, I'll fix that.
Thanks. I'm not quite sure about ppl_submit_iounit() - I don't see a place
where we are waiting for those bios to complete. If it is likely to happen
soon after bio submission, we should add REQ_SYNC there.

> Can't remember if others asked the question in your first post, sorry,
> but why we don't add REQ_SYNC in generic_make_request_checks() if we are
> going to stripe REQ_FUA, REQ_PREFLUSH. That will be less error prone.

Well, strictly speaking users of REQ_FUA do not necessarily have to use
REQ_SYNC. These are two different orthogonal things - one is a request for
bypassing disk cache, the other is a hint to the IO scheduler that there is
someone waiting for the IO to complete. Most of the time you wait for
REQ_FUA request immediately but I can see some uses in filesystems
where we might want to submit REQ_FUA request in the background (like when
doing background cleaning of the journal).

								Honza

> > CC: linux-raid@xxxxxxxxxxxxxxx
> > CC: Shaohua Li <shli@xxxxxxxxxx>
> > CC: Mike Snitzer <snitzer@xxxxxxxxxx>
> > CC: dm-devel@xxxxxxxxxx
> > Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > ---
> >  drivers/md/dm-snap-persistent.c | 3 ++-
> >  drivers/md/md.c                 | 2 +-
> >  drivers/md/raid5-cache.c        | 4 ++--
> >  3 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > Guys, I don't know enough about DM/MD to judge whether I've identified all the
> > places that want REQ_SYNC right. Can you please have a look?
> > 
> > diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
> > index b93476c3ba3f..b92ab4cb0710 100644
> > --- a/drivers/md/dm-snap-persistent.c
> > +++ b/drivers/md/dm-snap-persistent.c
> > @@ -741,7 +741,8 @@ static void persistent_commit_exception(struct dm_exception_store *store,
> >  	/*
> >  	 * Commit exceptions to disk.
> >  	 */
> > -	if (ps->valid && area_io(ps, REQ_OP_WRITE, REQ_PREFLUSH | REQ_FUA))
> > +	if (ps->valid && area_io(ps, REQ_OP_WRITE,
> > +				 REQ_SYNC | REQ_PREFLUSH | REQ_FUA))
> >  		ps->valid = 0;
> >  
> >  	/*
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 10367ffe92e3..212a6777ff31 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -765,7 +765,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
> >  	    test_bit(FailFast, &rdev->flags) &&
> >  	    !test_bit(LastDev, &rdev->flags))
> >  		ff = MD_FAILFAST;
> > -	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA | ff;
> > +	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA | ff;
> >  
> >  	atomic_inc(&mddev->pending_writes);
> >  	submit_bio(bio);
> > diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> > index 4c00bc248287..0a7af8b0a80a 100644
> > --- a/drivers/md/raid5-cache.c
> > +++ b/drivers/md/raid5-cache.c
> > @@ -1782,7 +1782,7 @@ static int r5l_log_write_empty_meta_block(struct r5l_log *log, sector_t pos,
> >  	mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
> >  					     mb, PAGE_SIZE));
> >  	if (!sync_page_io(log->rdev, pos, PAGE_SIZE, page, REQ_OP_WRITE,
> > -			  REQ_FUA, false)) {
> > +			  REQ_SYNC | REQ_FUA, false)) {
> >  		__free_page(page);
> >  		return -EIO;
> >  	}
> > @@ -2388,7 +2388,7 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
> >  		mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
> >  						     mb, PAGE_SIZE));
> >  		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
> > -			     REQ_OP_WRITE, REQ_FUA, false);
> > +			     REQ_OP_WRITE, REQ_SYNC | REQ_FUA, false);
> >  		sh->log_start = ctx->pos;
> >  		list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
> >  		atomic_inc(&log->stripe_in_journal_count);
> > -- 
> > 2.12.0
> > 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux