On Fri, Nov 14 2008, Rusty Russell wrote: > This allows more requests to fit in the descriptor ring. > > Copying 1.7M kernel image 100 times (with sync between) > Before: totsegs = 55661 totlen = 148859962 avg. 2674 > After: totsegs = 36097 totlen = 139439355 avg: 3862 > > Unfortunately, this coalescing is done at blk_rq_map_sg() which is too > late to be optimal: requests have already been limited to the value set > by blk_queue_max_hw_segments(). For us, that value reflects the > number of sg slots we can handle (ie. after clustering). > > I suspect other drivers have the same issue. Jens? blk_queue_max_hw_segments() is the size of your sg list, so yes the block layer will stop merging more into a request if we go beyond that. But it tracks merging along the way, so I don't see why there's a discrepancy between the two ends? Unless there's a bug there, of course... Queue clustering is on by default though when you allocate your queue, so I'm surprised you see a difference by doing: + /* Gather adjacent buffers to minimize sg length. */ + queue_flag_set(QUEUE_FLAG_CLUSTER, vblk->disk->queue); did test_bit(QUEUE_FLAG_CLUSTER, &vblk->disk->queue->queue_flags) really return 0 before? -- Jens Axboe _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/virtualization