On Wed, Apr 20, 2022 at 12:55 PM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > Hi, > > This is v2 of this series which addresses Christoph's feedback and > fixes some bugs. The first posting is at [1]. A git branch is > available at [2]. > > -- > > I've been doing some work trying to improve the bulk write performance > of raid5 on large systems with fast NVMe drives. The bottleneck appears > largely to be lock contention on the hash_lock and device_lock. This > series improves the situation slightly by addressing a couple of low > hanging fruit ways to take the lock fewer times in the request path. > > Patch 9 adjusts how batching works by keeping a reference to the > previous stripe_head in raid5_make_request(). Under most situtations, > this removes the need to take the hash_lock in stripe_add_to_batch_list() > which should reduce the number of times the lock is taken by a factor of > about 2. > > Patch 12 pivots the way raid5_make_request() works. Before the patch, the > code must find the stripe_head for every 4KB page in the request, so each > stripe head must be found once for every data disk. The patch changes this > so that all the data disks can be added to a stripe_head at once and the > number of times the stripe_head must be found (and thus the number of > times the hash_lock is taken) should be reduced by a factor roughly equal > to the number of data disks. > > The remaining patches are just cleanup and prep patches for those two > patches. > > Doing apples to apples testing this series on a small VM with 5 ram > disks, I saw a bandwidth increase of roughly 14% and lock contentions > on the hash_lock (as reported by lock stat) reduced by more than a factor > of 5 (though it is still significantly contended). > > Testing on larger systems with NVMe drives saw similar small bandwidth > increases from 3% to 20% depending on the parameters. Oddly small arrays > had larger gains, likely due to them having lower starting bandwidths; I > would have expected larger gains with larger arrays (seeing there > should have been even fewer locks taken in raid5_make_request()). > > Logan > > [1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@xxxxxxxxxxxx > [2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2 > The set looks good to me overall. Thanks everyone for the review and feedback. Logan, please incorporate feedback and send v3. Thanks, Song