Re: [PATCH 01/10] block: don't decrement nr_phys_segments for physically contigous segments

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 14, 2019 at 07:14:41AM +0200, Christoph Hellwig wrote:
> On Tue, May 14, 2019 at 12:36:43PM +0800, Ming Lei wrote:
> > > > Some workloads need this optimization, please see 729204ef49ec00b
> > > > ("block: relax check on sg gap"):
> > > 
> > > And we still allow to merge the segments with this patch.  The only
> > > difference is that these merges do get accounted as extra segments.
> > 
> > It is easy for .nr_phys_segments to reach the max segment limit by this
> > way, then no new bio can be merged any more.
> 
> As said in my other mail we only decremented it for request merges
> in the non-gap case before and no one complained.

However we still may make it better, for example, the attached patch can
save 10~20% time when running 'mkfs.ntfs -s 512 /dev/vda', lots of small
request(63KB) can be merged to big IO(1MB).

> 
> > We don't consider segment merge between two bios in ll_new_hw_segment(),
> > in my mkfs test over virtio-blk, request size can be increased to ~1M(several
> > segments) from 63k(126 bios/segments) easily if the segment merge between
> > two bios is considered.
> 
> With the gap devices we have unlimited segment size, see my next patch
> to actually enforce that.  Which is much more efficient than using

But this patch does effect on non-gap device, and actually most of
device are non-gap type.

> multiple segments.  Also instead of hacking up the merge path even more
> we can fix the block device buffered I/O path to submit large I/Os
> instead of relying on merging like we do in the direct I/O code and every
> major file system.  I have that on my plate as a todo list item.
> 
> > > We do that in a couple of places.  For one the nvme single segment
> > > optimization that triggered this bug.  Also for range discard support
> > > in nvme and virtio.  Then we have loop that  iterate the segments, but
> > > doesn't use the nr_phys_segments count, and plenty of others that
> > > iterate over pages at the moment but should be iterating bvecs,
> > > e.g. ubd or aoe.
> > 
> > Seems discard segment doesn't consider bios merge for nvme and virtio,
> > so it should be fine in this way. Will take a close look at nvme/virtio
> > discard segment merge later.
> 
> I found the bio case by looking at doing the proper accounting in the
> bio merge path and hitting KASAN warning due to the range kmalloc.
> So that issue is real as well.

I guess the following patch may fix it:

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index f1d90cd3dc47..a913110de389 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -175,7 +175,7 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
 
 static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
 {
-	unsigned short segments = blk_rq_nr_discard_segments(req);
+	unsigned short segments = 0;
 	unsigned short n = 0;
 	struct virtio_blk_discard_write_zeroes *range;
 	struct bio *bio;
@@ -184,6 +184,9 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
 	if (unmap)
 		flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
 
+	__rq_for_each_bio(bio, req)
+		segments++;
+
 	range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
 	if (!range)
 		return -ENOMEM;


Thanks,
Ming
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 21e87a714a73..af122efeb28d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -622,6 +622,8 @@ static inline int ll_new_hw_segment(struct request_queue *q,
 				    struct bio *bio)
 {
 	int nr_phys_segs = bio_phys_segments(q, bio);
+	unsigned int seg_size =
+		req->biotail->bi_seg_back_size + bio->bi_seg_front_size;
 
 	if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q))
 		goto no_merge;
@@ -629,11 +631,15 @@ static inline int ll_new_hw_segment(struct request_queue *q,
 	if (blk_integrity_merge_bio(q, req, bio) == false)
 		goto no_merge;
 
-	/*
-	 * This will form the start of a new hw segment.  Bump both
-	 * counters.
-	 */
-	req->nr_phys_segments += nr_phys_segs;
+	if (blk_phys_contig_segment(q, req->biotail, bio)) {
+		if (nr_phys_segs)
+			req->nr_phys_segments += nr_phys_segs - 1;
+		if (nr_phys_segs == 1)
+			bio->bi_seg_back_size = seg_size;
+		if (req->nr_phys_segments == 1)
+			req->bio->bi_seg_front_size = seg_size;
+	} else
+		req->nr_phys_segments += nr_phys_segs;
 	return 1;
 
 no_merge:
-- 
2.20.1


[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux