On Wed, 2018-05-02 at 13:27 +0300, Max Gurtovoy wrote: > > > > Performance results running fio (8 jobs, 64 iodepth) using ramdisk > > > > (w/w.o patch): > > > > > > > > bs IOPS(randread) IOPS(randwrite) > > > > ------ --------------- --------------- > > > > 512 329.4K / 340.3K 379.1K / 387.7K > > > > 3% performance drop > > > > > > 1K 333.4K / 337.6K 364.2K / 370.0K > > > > 1.4% performance drop > > > > > > 2K 321.1K / 323.5K 334.2K / 337.8K > > > > .75% performance drop > > > > I know you said the performance hit was not significant, and by the time > > you get to 2k reads/writes, I agree with you, but at the low end, I'm > > not sure you can call 3% "not significant". > > > > Is a 3% performance hit better than the transient rnr error? And is > > this the only solution available? > > This problem here is the fact that iSER is using 1 QP per session and > then you see 3% hit. I guess if we run a test using 64 targets and 64 > sessions (64 QPs) then the locality of the completions would have smooth > this hit away. Yeah, but if you had 64 QPs then your pre-patch numbers would have been higher too due to greater spreading of the load, so it might have still come out at 3%. > I agree that 3% hit per this scenario is not optimal. I > also hope that the work that IdanB and myself did with mlx5 inline > KLM/MTT will balance this one as well (Currently it should be pushed in > Leon/Jason PR. We saw > x3.5 improvement for small IOs in NVME-oF) - > Sergey, we can check this out in our lab. Your talking apples to oranges here. An across the board performance hit and an adapter specific performance increase to offset it. I'm all for the performance benefit of the inlining, but it's orthogonal to the performance hit of this change. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part