Re: 4.5-rc iser issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sagi,

On Sun, Feb 14, 2016 at 11:22 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> Adding Ming to Cc.
>
> But I don't think simply not cloning the biovecs is the right thing
> to do in the end.  This must be something with the bvec iterators.

I agree with Christoph, and there might be issues somewhere.

> From the log:
> iser: sg[0] dma_addr:0x85FC06000 off:0x0 sz:0x200 dma_len:0x200
> iser: sg[1] dma_addr:0x860334000 off:0x0 sz:0x200 dma_len:0x200 <-- gap
> iser: sg[2] dma_addr:0x860335000 off:0x0 sz:0x200 dma_len:0x200 <-- gap

The above gap shouldn't have come since blk_bio_segment_split() splits
out one new bio if gap is detected.

Sort of the following code can be added in driver or prep_fn to check if
bvec of  the rq is correct:

rq_for_each_segment(bvec, sc->request, iter) {
     //check if there is gap between bvec
}

I don't know how to use iser, and looks everything works fine after
I setup virt boundary as 4095 for null_blk by the attachment
patch.

> Full quote for Ming:
>
> On Sun, Feb 14, 2016 at 04:02:18PM +0200, Sagi Grimberg wrote:
>>
>> >>I'm bisecting now, there are a couple of patches from Ming in
>> >>the area of the bio splitting code...
>> >>
>> >>CC'ing Ming, Linux-block and Linux-nvme as iser is identical to nvme
>> >>wrt the virtual boundary so I think nvme will break as well.

The bisected commit is merged to v4.3, and looks no such kind of
report from nvme.

>> >
>> >Bisection reveals that this one is the culprit:
>> >
>> >commit 52cc6eead9095e2faf2ec7afc013aa3af1f01ac5
>> >Author: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> >Date:   Thu Sep 17 09:58:38 2015 -0600
>> >
>> >     block: blk-merge: fast-clone bio when splitting rw bios
>> >
>> >     biovecs has become immutable since v3.13, so it isn't necessary
>> >     to allocate biovecs for the new cloned bios, then we can save
>> >     one extra biovecs allocation/copy, and the allocation is often
>> >     not fixed-length and a bit more expensive.
>> >
>> >     For example, if the 'max_sectors_kb' of null blk's queue is set
>> >     as 16(32 sectors) via sysfs just for making more splits, this patch
>> >     can increase throught about ~70% in the sequential read test over
>> >     null_blk(direct io, bs: 1M).
>> >
>> >     Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
>> >     Cc: Kent Overstreet <kent.overstreet@xxxxxxxxx>
>> >     Cc: Ming Lin <ming.l@xxxxxxxxxxxxxxx>
>> >     Cc: Dongsu Park <dpark@xxxxxxxxxx>
>> >     Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> >
>> >     This fixes a performance regression introduced by commit 54efd50bfd,
>> >     and allows us to take full advantage of the fact that we have
>> >immutable
>> >     bio_vecs. Hand applied, as it rejected violently with commit
>> >     5014c311baa2.
>> >
>> >     Signed-off-by: Jens Axboe <axboe@xxxxxx>
>> >--
>>
>> Looks like there is a problem with bio_clone_fast()
>>
>> This change makes the problem go away:
>> --
>> diff --git a/block/bio.c b/block/bio.c
>> index dbabd48..5e93733 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -1791,7 +1791,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
>>          * Discards need a mutable bio_vec to accommodate the payload
>>          * required by the DSM TRIM and UNMAP commands.
>>          */
>> -       if (bio->bi_rw & REQ_DISCARD)
>> +       if (1 || bio->bi_rw & REQ_DISCARD)
>>                 split = bio_clone_bioset(bio, gfp, bs);
>>         else
>>                 split = bio_clone_fast(bio, gfp, bs);
>> --
>>
>> Any thoughts?

I don't think bio_clone_fast() is wrong, which use bvec table from
the original bio, and drivers are not allowed to change the table, and
should just call the standard iterator helpers to access bvec.

Also bio_for_each_segment_all() can't be used to iterate over one
cloned bio.

Thanks,
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 8ba1e97..5328f3c 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -10,6 +10,8 @@
 #include <linux/hrtimer.h>
 #include <linux/lightnvm.h>
 
+#define MAX_SG	128
+
 struct nullb_cmd {
 	struct list_head list;
 	struct llist_node ll_list;
@@ -19,6 +21,7 @@ struct nullb_cmd {
 	unsigned int tag;
 	struct nullb_queue *nq;
 	struct hrtimer timer;
+	struct scatterlist sg[MAX_SG];
 };
 
 struct nullb_queue {
@@ -351,6 +354,23 @@ static void null_request_fn(struct request_queue *q)
 	}
 }
 
+static int handle_sg(struct nullb_cmd *cmd)
+{
+	int num, i;
+	struct request *rq = cmd->rq;
+
+	num = blk_rq_map_sg(rq->q, rq, cmd->sg);
+	trace_printk("%s dump sg for %p(%d)\n", __func__, rq, rq->tag);
+	for (i = 0; i < num; i++) {
+		struct scatterlist *sg = &cmd->sg[i];
+		trace_printk("\t %4d: %u %u %llu\n",
+			     i, sg->offset, sg->length,
+			     (unsigned long long)sg->dma_address);
+	}
+
+	return 0;
+}
+
 static int null_queue_rq(struct blk_mq_hw_ctx *hctx,
 			 const struct blk_mq_queue_data *bd)
 {
@@ -365,6 +385,8 @@ static int null_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(bd->rq);
 
+	handle_sg(cmd);
+
 	null_handle_cmd(cmd);
 	return BLK_MQ_RQ_QUEUE_OK;
 }
@@ -707,6 +729,8 @@ static int null_add_dev(void)
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, nullb->q);
 	queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, nullb->q);
 
+	blk_queue_virt_boundary(nullb->q, (1UL << 12) - 1);
+	blk_queue_max_segments(nullb->q, MAX_SG);
 
 	mutex_lock(&lock);
 	list_add_tail(&nullb->list, &nullb_list);

[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux