Re: [PATCH for-4.4] block: split bios to max possible length

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Keith,

On Tue, Jan 5, 2016 at 11:09 PM, Keith Busch <keith.busch@xxxxxxxxx> wrote:
> On Tue, Jan 05, 2016 at 12:54:53PM +0800, Ming Lei wrote:
>> On Tue, Jan 5, 2016 at 2:24 AM, Keith Busch <keith.busch@xxxxxxxxx> wrote:
>> > This allows bio splits in the middle of a vector to form the largest
>>
>> Wrt. the current block stack, one segment always hold one or more bvec,
>> instead of part of bvec, so better to be careful about this handling.
>
> Hi Ming,
>
> Could you help me understand your concern here? If we split a vector
> somewhere in the middle, it becomes two different bvecs. The first is
> the last segment in the first bio, the second is the first segment in
> the split bio, right?

Firstly we didn't split one single bio vector before bio splitting.

Secondly, current bio split still doesn't support to split one single
bvec into two, and it just makes the two bios shared the original
bvec table, please see bio_split(), which calls bio_clone_fast()
to do that, and the bvec table has been immutable at that time.

>
> It's not necessarily a new segment if it is physically contiguous with
> the previous (if it exists at all), but duplicating the logic to coalesce
> addresses doesn't seem to be worth that optimization.

I understand your motivation in the two patches, actually before bio splitting,
we don't do sg merge for nvme because of the flag of NO_SG_MERGE,
which is ignored after bio splitting is introduced. So could you check if
the nvme performance can be good by putting NO_SG_MERGE back
in blk_bio_segment_split()? And the change should be simple, like the
attached patch.

>
>> > possible bio at the h/w's desired alignment, and guarantees the bio being
>> > split will have some data. Previously, if the first vector's length was
>> > greater than the allowable amount, the bio would split at a zero length
>> > and hit a kernel BUG.
>>
>> That is introduced by d3805611130a, and zero length can't be splitted
>> previously because queue_max_sectors() is at least one PAGE_SIZE.
>
> Can a bvec's length exceed a PAGE_SIZE? They point to pages, so I
> suppose not.

No, it doesn't, but blk_max_size_offset() may be less than PAGE_SIZE,
then zero splitting is triggered.

>
> But it should be more efficient to split to the largest allowed by the
> hardware. We can contrive a scenario where a bio would be split many

Previously Jens took the opposite approach to make each bvec
as one segment, and he mentioned performance is increased.

> times more than necessary without this patch. Let's say queue_max_sectors
> is a PAGE_SIZE, and we want to submit '2 * PAGE_SIZE' worth of data
> addressed in 3 bvecs. Previously that would split three times; now it
> will split only twice.

IMO, splitting is quite cheap, or we still can increase the limit of
queue_max_sectors() to the hardware allowed value.


-- 
Ming Lei
diff --git a/block/blk-merge.c b/block/blk-merge.c
index e73846a..64fbbba 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -79,9 +79,13 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	unsigned front_seg_size = bio->bi_seg_front_size;
 	bool do_split = true;
 	struct bio *new = NULL;
+	bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
 
 	bio_for_each_segment(bv, bio, iter) {
-		if (sectors + (bv.bv_len >> 9) > blk_max_size_offset(q, bio->bi_iter.bi_sector))
+		if (no_sg_merge)
+			goto new_segment;
+
+		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
 			goto split;
 
 		/*

[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux