Hello, I noticed that LIO's WRITE_SAME implementation doesn't perform as well as I would expect. In fact, non-accelerated ESXi eager disk zeroing over 10GE iSCSI is two times faster than the accelerated zeroing in our test lab. Initiator setup: Dell R300, ESXi 5.5, software iSCSI over 1GE/10GE. Target setup: Dell R510, PERC H700 controller, 3x Intel S3700 SSD in RAID0, exported as iblock iSCSI LUN, 3.15.5 kernel. After some digging in sources and playing with blktrace, I think the problem comes from the way iblock_execute_write_same() handles sequential writes. It adds them to bios in small 512-byte blocks, i.e. one sector at a time. On the other hand, ESXi issues the non-accelerated writes as 128kB blocks. The difference in blktrace output is obvious. I tried to tweak deadline scheduler and other tunables, but with no luck. To verify my idea, I modified iblock_execute_write_same() so that it submits full ZERO_PAGEs by analogy with __blkdev_issue_zeroout(). In other words, I increased bio blocks size from 512 to 4096 bytes. This quick dirty hack raised WRITE_SAME performance several times and it's close to the maximal sequential write speed of the raw device now. Moreover, target CPU usage and softirqs dropped significantly too. Maybe iblock_execute_write_same() should prepare longer contiguous data block before it starts submitting it to the block layer? And/or use ZERO_PAGE if it is used for zeroing (99% cases today, I guess). Any ideas? Thanks, Martin -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html