On Tue, 2018-01-09 at 17:29 -0700, Jens Axboe wrote: > Move completion related items (like the call single data) near the > end of the struct, instead of mixing them in with the initial > queueing related fields. > > Move queuelist below the bio structures. Then we have all > queueing related bits in the first cache line. > > This yields a 1.5-2% increase in IOPS for a null_blk test, both for > sync and for high thread count access. Sync test goes form 975K to > 992K, 32-thread case from 20.8M to 21.2M IOPS. That's a nice result! Reviewed-by: Bart Van Assche <bart.vanassche@xxxxxxx>