On Tue, Jul 21, 2020 at 10:10 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > On 07/22/2020 10:43 AM, Xiao Ni wrote: > > > > > > On 07/22/2020 08:17 AM, Song Liu wrote: > >> On Sun, Jul 19, 2020 at 10:38 PM Coly Li <colyli@xxxxxxx> wrote: > >>> On 2020/7/20 13:26, Coly Li wrote: > >>>> On 2020/7/20 12:58, Xiao Ni wrote: > >>>>> Now the discard request is split by chunk size. So it takes a long > >>>>> time to finish mkfs on > >>>>> disks which support discard function. This patch improve handling > >>>>> raid10 discard request. > >>>>> It uses the similar way with raid0(29efc390b). > >>>>> > >>>>> But it's a little complex than raid0. Because raid10 has different > >>>>> layout. If raid10 is > >>>>> offset layout and the discard request is smaller than stripe size. > >>>>> There are some holes > >>>>> when we submit discard bio to underlayer disks. > >>>>> > >>>>> For example: five disks (disk1 - disk5) > >>>>> D01 D02 D03 D04 D05 > >>>>> D05 D01 D02 D03 D04 > >>>>> D06 D07 D08 D09 D10 > >>>>> D10 D06 D07 D08 D09 > >>>>> The discard bio just wants to discard from D03 to D10. For disk3, > >>>>> there is a hole between > >>>>> D03 and D08. For disk4, there is a hole between D04 and D09. D03 > >>>>> is a chunk, raid10_write_request > >>>>> can handle one chunk perfectly. So the part that is not aligned > >>>>> with stripe size is still > >>>>> handled by raid10_write_request. > >>>>> > >>>>> If reshape is running when discard bio comes and the discard bio > >>>>> spans the reshape position, > >>>>> raid10_write_request is responsible to handle this discard bio. > >>>>> > >>>>> I did a test with this patch set. > >>>>> Without patch: > >>>>> time mkfs.xfs /dev/md0 > >>>>> real4m39.775s > >>>>> user0m0.000s > >>>>> sys0m0.298s > >>>>> > >>>>> With patch: > >>>>> time mkfs.xfs /dev/md0 > >>>>> real0m0.105s > >>>>> user0m0.000s > >>>>> sys0m0.007s > >>>>> > >>>>> nvme3n1 259:1 0 477G 0 disk > >>>>> └─nvme3n1p1 259:10 0 50G 0 part > >>>>> nvme4n1 259:2 0 477G 0 disk > >>>>> └─nvme4n1p1 259:11 0 50G 0 part > >>>>> nvme5n1 259:6 0 477G 0 disk > >>>>> └─nvme5n1p1 259:12 0 50G 0 part > >>>>> nvme2n1 259:9 0 477G 0 disk > >>>>> └─nvme2n1p1 259:15 0 50G 0 part > >>>>> nvme0n1 259:13 0 477G 0 disk > >>>>> └─nvme0n1p1 259:14 0 50G 0 part > >>>>> > >>>>> v1: > >>>>> Coly helps to review these patches and give some suggestions: > >>>>> One bug is found. If discard bio is across one stripe but bio size > >>>>> is bigger > >>>>> than one stripe size. After spliting, the bio will be NULL. In > >>>>> this version, > >>>>> it checks whether discard bio size is bigger than 2*stripe_size. > >>>>> In raid10_end_discard_request, it's better to check > >>>>> R10BIO_Uptodate is set > >>>>> or not. It can avoid write memory to improve performance. > >>>>> Add more comments for calculating addresses. > >>>>> > >>>>> Signed-off-by: Xiao Ni <xni@xxxxxxxxxx> > >>>> The code looks good to me, you may add my Reviewed-by: Coly Li > >>>> <colyli@xxxxxxx> > >>>> > >>>> But I still suggest you to add a more detailed commit log, to explain > >>>> how the discard range of each component disk is decided. Anyway > >>>> this is > >>>> just a suggestion, it is up to you. > >>>> > >>>> Thank you for this work. > >>> In raid10_end_discard_request(), the first 5 lines for local variables > >>> declaration, there are 3 lines starts with improper blanks (should be > >>> tab '\t'). You may find this minor issue by checkpatch.pl I guess. > >>> > >>> After fixing this format issue, you stil have my Reviewed-by :-) > >>> > >>> Coly Li > >> Thanks to Xiao for the patch And thanks to Coly for the review. > >> > >> I did an experiment with a patch to increase max_discard_sectors, > >> like: > >> > >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > >> index 14b1ba732cd7d..0b15397ebfaf9 100644 > >> --- a/drivers/md/raid10.c > >> +++ b/drivers/md/raid10.c > >> @@ -3762,7 +3762,7 @@ static int raid10_run(struct mddev *mddev) > >> chunk_size = mddev->chunk_sectors << 9; > >> if (mddev->queue) { > >> blk_queue_max_discard_sectors(mddev->queue, > >> - mddev->chunk_sectors); > >> + UINT_MAX); > >> blk_queue_max_write_same_sectors(mddev->queue, 0); > >> blk_queue_max_write_zeroes_sectors(mddev->queue, 0); > >> blk_queue_io_min(mddev->queue, chunk_size); > >> > >> This simple change reduces the run time of mkfs.xfs on my SSD > >> from 5+ minutes to about 14 seconds. Of course this is not as good > >> as ~2s seconds with the set. But I wonder whether the time saving > >> justifies extra complexity. Thoughts on this? > >> > >> Also, as Coly mentioned, please run checkpatch.pl on the patches. > >> I have seen warnings on all 3 patches in the set. > >> > >> Thanks, > >> Song > >> > > Hi Song > > > > The mkfs.xfs still takes a very long time if just modifying the max > > discard sectors in my environment. > > But it doesn't make sense in your test. It still needs to split the > > bio based on the chunk size. So it don't > > make a difference only modifying the mas discard sectors? > > > > And thanks Coly for the review. I'll fix the problem in the next version. > > > It takes effect only changing mas discard sectors to UINT_MAX in my > environment. > time mkfs.xfs /dev/md0 -f > real 50m1.674s (no change) > real 13m19.367s (with change) > > Maybe it depends on the disks. In my environment, it still takes a long > time. I see. Let's fix it with this patch then. Please fix checkpatch.pl warnings and resubmit with Coly's Review tag. Thanks, Song