On Tue, May 07, 2019 at 04:38:39PM +0300, Max Gurtovoy wrote: > Performance results running fio (24 jobs, 128 iodepth) using > write_generate=1 and read_verify=1 (w/w.o patch): > > bs IOPS(read) IOPS(write) > ---- ---------- ---------- > 512 1266.4K/1262.4K 1720.1K/1732.1K > 4k 793139/570902 1129.6K/773982 > 32k 72660/72086 97229/96164 > > Using write_generate=0 and read_verify=0 (w/w.o patch): > bs IOPS(read) IOPS(write) > ---- ---------- ---------- > 512 1590.2K/1600.1K 1828.2K/1830.3K > 4k 1078.1K/937272 1142.1K/815304 > 32k 77012/77369 98125/97435 So this makes almost no difference for 512byte or 32k block sizes, but a huge difference for 4k, which seems a little odd. Do you have a good explanation for that? > case IB_WR_REG_MR_INTEGRITY: > - memset(®_pi_wr, 0, sizeof(struct ib_reg_wr)); Btw, I think the driver would really benefit from eventually splitting out each case in this huge switch statement into a helper. Everytime I had to stare at it it took me forever to understand it.