On Mon, May 08, 2017 at 03:31:47PM +0800, Coly Li wrote: > On 2017/5/8 上午8:36, Shaohua Li wrote: > > There are complaints that raid0 discard handling is slow. Currently we > > divide discard request into chunks and dispatch to underlayer disks. The > > block layer will do merge to form big requests. This causes a lot of > > request split/merge and uses significant CPU time. > > > > A simple idea is to calculate the range for each raid disk for an IO > > request and send a discard request to raid disks, which will avoid the > > split/merge completely. Previously Coly tried the approach, but the > > implementation was too complex because of raid0 zones. This patch always > > split bio in zone boundary and handle bio within one zone. It simplifies > > the implementation a lot. > > > > Cc: NeilBrown <neilb@xxxxxxxx> > > Cc: Coly Li <colyli@xxxxxxx> > > Signed-off-by: Shaohua Li <shli@xxxxxx> > > --- > > drivers/md/raid0.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 102 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c > > index 84e5859..d6c0bc7 100644 > > --- a/drivers/md/raid0.c > > +++ b/drivers/md/raid0.c > > @@ -385,7 +385,7 @@ static int raid0_run(struct mddev *mddev) > > blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors); > > blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors); > > blk_queue_max_write_zeroes_sectors(mddev->queue, mddev->chunk_sectors); > > - blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors); > > + blk_queue_max_discard_sectors(mddev->queue, UINT_MAX); > > > > blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9); > > blk_queue_io_opt(mddev->queue, > > @@ -459,6 +459,95 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev, > > } > > } > > > > +static void raid0_handle_discard(struct mddev *mddev, struct bio *bio) > > +{ > > + struct r0conf *conf = mddev->private; > > + struct strip_zone *zone; > > + sector_t start = bio->bi_iter.bi_sector; > > + sector_t end; > > + unsigned int stripe_size; > > + sector_t first_stripe_index, last_stripe_index; > > + sector_t start_disk_offset; > > + unsigned int start_disk_index; > > + sector_t end_disk_offset; > > + unsigned int end_disk_index; > > + unsigned int disk; > > + > > + zone = find_zone(conf, &start); > > + > > + if (bio_end_sector(bio) > zone->zone_end) { > > + struct bio *split = bio_split(bio, > > + zone->zone_end - bio->bi_iter.bi_sector, GFP_NOIO, > > + mddev->bio_set); > > + bio_chain(split, bio); > > + generic_make_request(bio); > > + bio = split; > > + end = zone->zone_end; > > + } else > > + end = bio_end_sector(bio); > > + > > Nice, good idea. Indeed, I have a WIP patch which trys to follow your > idea, but no clean as yours. > > > > + if (zone != conf->strip_zone) > > + end = end - zone[-1].zone_end; > > + > > + /* Now start and end is the offset in zone */ > > + stripe_size = zone->nb_dev * mddev->chunk_sectors; > > + > > + first_stripe_index = start; > > + sector_div(first_stripe_index, stripe_size); > > + last_stripe_index = end; > > + sector_div(last_stripe_index, stripe_size); > > + > > + start_disk_index = (int)(start - first_stripe_index * stripe_size) / > > + mddev->chunk_sectors; > > + start_disk_offset = ((int)(start - first_stripe_index * stripe_size) % > > + mddev->chunk_sectors) + > > + first_stripe_index * mddev->chunk_sectors; > > + end_disk_index = (int)(end - last_stripe_index * stripe_size) / > > + mddev->chunk_sectors; > > + end_disk_offset = ((int)(end - last_stripe_index * stripe_size) % > > + mddev->chunk_sectors) + > > + last_stripe_index * mddev->chunk_sectors; > > + > > Nice code, again! Much simpler. It is enjoyed experience when reading > these calculation :-) > > > > + for (disk = 0; disk < zone->nb_dev; disk++) { > > + sector_t dev_start, dev_end; > > + struct bio *discard_bio = NULL; > > + struct md_rdev *rdev; > > + > > + if (disk < start_disk_index) > > + dev_start = (first_stripe_index + 1) * > > + mddev->chunk_sectors; > > + else if (disk > start_disk_index) > > + dev_start = first_stripe_index * mddev->chunk_sectors; > > + else > > + dev_start = start_disk_offset; > > + > > + if (disk < end_disk_index) > > + dev_end = (last_stripe_index + 1) * mddev->chunk_sectors; > > + else if (disk > end_disk_index) > > + dev_end = last_stripe_index * mddev->chunk_sectors; > > + else > > + dev_end = end_disk_offset; > > + > > + if (dev_end <= dev_start) > > + continue; > > + > > + rdev = conf->devlist[(zone - conf->strip_zone) * > > + conf->strip_zone[0].nb_dev + disk]; > > + if (__blkdev_issue_discard(rdev->bdev, > > + dev_start + zone->dev_start + rdev->data_offset, > > + dev_end - dev_start, GFP_NOIO, 0, &discard_bio) || > > + !discard_bio) > > + continue; > > Could you please give me any hint ? Why __blkdev_issue_discard() is > called here. The following generic_make_request() will issue the bio to > underlying device and call __blkdev_issue_discard() in its code path > eventually, why call this function explicitly here? I was going to allocate a bio, init it and dispatch it to raid disks, but found __blkdev_issue_discard essentially does these, so reuse it. did you see any problem? Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html