On Mon, May 13, 2019 at 02:03:44PM +0200, Christoph Hellwig wrote: > On Mon, May 13, 2019 at 05:45:45PM +0800, Ming Lei wrote: > > On Mon, May 13, 2019 at 08:37:45AM +0200, Christoph Hellwig wrote: > > > Currently ll_merge_requests_fn, unlike all other merge functions, > > > reduces nr_phys_segments by one if the last segment of the previous, > > > and the first segment of the next segement are contigous. While this > > > seems like a nice solution to avoid building smaller than possible > > > > Some workloads need this optimization, please see 729204ef49ec00b > > ("block: relax check on sg gap"): > > And we still allow to merge the segments with this patch. The only > difference is that these merges do get accounted as extra segments. It is easy for .nr_phys_segments to reach the max segment limit by this way, then no new bio can be merged any more. We don't consider segment merge between two bios in ll_new_hw_segment(), in my mkfs test over virtio-blk, request size can be increased to ~1M(several segments) from 63k(126 bios/segments) easily if the segment merge between two bios is considered. > > > > requests it causes a mismatch between the segments actually present > > > in the request and those iterated over by the bvec iterators, including > > > __rq_for_each_bio. This could cause overwrites of too small kmalloc > > > > Request based drivers usually shouldn't iterate bio any more. > > We do that in a couple of places. For one the nvme single segment > optimization that triggered this bug. Also for range discard support > in nvme and virtio. Then we have loop that iterate the segments, but > doesn't use the nr_phys_segments count, and plenty of others that > iterate over pages at the moment but should be iterating bvecs, > e.g. ubd or aoe. Seems discard segment doesn't consider bios merge for nvme and virtio, so it should be fine in this way. Will take a close look at nvme/virtio discard segment merge later. Thanks, Ming