On 2022-04-24 01:53, Guoqing Jiang wrote: > > > On 4/21/22 3:54 AM, Logan Gunthorpe wrote: >> Hi, >> >> This is v2 of this series which addresses Christoph's feedback and >> fixes some bugs. The first posting is at [1]. A git branch is >> available at [2]. >> >> -- >> >> I've been doing some work trying to improve the bulk write performance >> of raid5 on large systems with fast NVMe drives. The bottleneck appears >> largely to be lock contention on the hash_lock and device_lock. This >> series improves the situation slightly by addressing a couple of low >> hanging fruit ways to take the lock fewer times in the request path. >> >> Patch 9 adjusts how batching works by keeping a reference to the >> previous stripe_head in raid5_make_request(). Under most situtations, >> this removes the need to take the hash_lock in stripe_add_to_batch_list() >> which should reduce the number of times the lock is taken by a factor of >> about 2. >> >> Patch 12 pivots the way raid5_make_request() works. Before the patch, the >> code must find the stripe_head for every 4KB page in the request, so each >> stripe head must be found once for every data disk. The patch changes this >> so that all the data disks can be added to a stripe_head at once and the >> number of times the stripe_head must be found (and thus the number of >> times the hash_lock is taken) should be reduced by a factor roughly equal >> to the number of data disks. >> >> The remaining patches are just cleanup and prep patches for those two >> patches. >> >> Doing apples to apples testing this series on a small VM with 5 ram >> disks, I saw a bandwidth increase of roughly 14% and lock contentions >> on the hash_lock (as reported by lock stat) reduced by more than a factor >> of 5 (though it is still significantly contended). >> >> Testing on larger systems with NVMe drives saw similar small bandwidth >> increases from 3% to 20% depending on the parameters. Oddly small arrays >> had larger gains, likely due to them having lower starting bandwidths; I >> would have expected larger gains with larger arrays (seeing there >> should have been even fewer locks taken in raid5_make_request()). >> >> Logan >> >> [1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@xxxxxxxxxxxx >> [2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2 >> >> -- >> >> Changes since v1: >> - Rebased on current md-next branch (190a901246c69d79) >> - Added patch to create a helper for checking if a sector >> is ahead of the reshape (per Christoph) >> - Reworked the __find_stripe() patch to create a find_get_stripe() >> helper (per Christoph) >> - Added more patches to further refactor raid5_make_request() and >> pull most of the loop body into a helper function (per Christoph) >> - A few other minor cleanups (boolean return, droping casting when >> printing sectors, commit message grammar) as suggested by Christoph. >> - Fixed two uncommon but bad data corruption bugs in that were found. >> >> -- >> >> Logan Gunthorpe (12): >> md/raid5: Factor out ahead_of_reshape() function >> md/raid5: Refactor raid5_make_request loop >> md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio() >> md/raid5: Move common stripe count increment code into __find_stripe() >> md/raid5: Factor out helper from raid5_make_request() loop >> md/raid5: Drop the do_prepare flag in raid5_make_request() >> md/raid5: Move read_seqcount_begin() into make_stripe_request() >> md/raid5: Refactor for loop in raid5_make_request() into while loop >> md/raid5: Keep a reference to last stripe_head for batch >> md/raid5: Refactor add_stripe_bio() >> md/raid5: Check all disks in a stripe_head for reshape progress >> md/raid5: Pivot raid5_make_request() > > Generally, I don't object the cleanup patches since the code looks more > cleaner. > But my concern is that since some additional function calls are added to > hot path > (raid5_make_request), could the performance be affected? There's a bit of logic added to the raid5_make_requests but it is all local and should be fast, and it reduces the amount of calls to the slow contended locks. > And I think patch 9 and patch 12 are helpful for performance > improvement, did > you measure the performance without those cleanup patches? Yes, I compared performance with and without this entire series. Logan