On Mon, May 28, 2012 at 11:34:34PM -0400, Kent Overstreet wrote: > On Tue, May 29, 2012 at 11:54:38AM +1000, Dave Chinner wrote: > > It also allowed us to build IOs that span > > entire RAID stripe widths, thereby avoiding potential RAID RMW > > cycles, and even allowing high end raid controllers to trigger BBWC > > bypass fast paths that could double or triple the write throughput > > of the arrays... > > merge_bvec_fn has nothing to do with that though, since for one there You're mistaking me for someone who cares about merge_bvec_fn(). Someone asked me to describe why XFS uses bio_add_page().... > aren't any merge_bvec_fn's being called in the IO paths on these high > end disk arrays, Yes there are, because high bandwidth filesytem use software RAID 0 striping to stripe multiple hardware RAID luns together to acheive the necessary bandwidth. Hardware RAID is used for disk failure prevention and to manage 1000 disks more easily, while software RAID (usually with multipathing) is used to scale the performance.... > and for our software raid implementations their > merge_bvec_fns will keep you from sending them bios that span entire > stripes. Well, yeah, the lower layer has to break up large bios into chunks for it's sub-devices. What matters is that we build IOs that are larger than what the lower layers break it up into. e.g. if your hardware RAID5 stripe width is 1MB, then the software RAID chunks size is 1MB (and the stripe width is N luns X 1MB), then all that matters is that we build IOs larger than 1MB so that we get full stripe writes at that hardware RAID level and so avoid RMW cycles right at the bottom of the IO stack... As long as the new code still allows us to achieve the same or better IO sizes without any new overhead, then I simply don't care what happens to the guts of bio_add_page(). Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel