On 5/8/2019 4:17 PM, Christoph Hellwig wrote:
On Tue, May 07, 2019 at 04:38:39PM +0300, Max Gurtovoy wrote:
Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):
bs IOPS(read) IOPS(write)
---- ---------- ----------
512 1266.4K/1262.4K 1720.1K/1732.1K
4k 793139/570902 1129.6K/773982
32k 72660/72086 97229/96164
Using write_generate=0 and read_verify=0 (w/w.o patch):
bs IOPS(read) IOPS(write)
---- ---------- ----------
512 1590.2K/1600.1K 1828.2K/1830.3K
4k 1078.1K/937272 1142.1K/815304
32k 77012/77369 98125/97435
So this makes almost no difference for 512byte or 32k block sizes,
but a huge difference for 4k, which seems a little odd. Do you have
a good explanation for that?
Yes. The servers that were used for the measurements weren't so strong
to show the improvements for 512B.
We'll try to find stronger servers for that.
For the case of 32K it's obvious, it doesn't fall to the case of PA
mappings (sg_nents == 1).
case IB_WR_REG_MR_INTEGRITY:
- memset(®_pi_wr, 0, sizeof(struct ib_reg_wr));
Btw, I think the driver would really benefit from eventually splitting
out each case in this huge switch statement into a helper. Everytime
I had to stare at it it took me forever to understand it.
Sure, this was exactly what I thought during the development. It's on
our plate after merging this series that's already big enough.