Hi Sagi, I'm not sharing your worries about bad out-of-the-box experience for a number of reasons. First of all, this code is part of upstream kernel and will take time till users actually start to use it as is and not as part of some distro backports or MOFED packages.
True, but I am still saying that this feature is damaging sync IO which represents the majority of the users. It might not be an extreme impact but it is still a degradation (from a very limited testing I did this morning I'm seeing a consistent 5%-10% latency increase for low QD workloads which is consistent with what Yamin reported AFAIR). But having said that, the call is for you guys to make as this is a Mellanox device. I absolutely think that this is useful (as I said before), I just don't think its necessarily a good idea to opt it by default given that only a limited set of users would take full advantage of it while the rest would see a negative impact (even if its 10%). I don't have a hard objection here, just wanted to give you my opinion on this because mlx5 is an important driver for rdma users.
Second, Yamin did extensive testing and worked very close with Or G. and I have very high confident in the results of their team work.
Has anyone tested other RDMA ulps? NFS/RDMA or SRP/iSER? Would be interesting to understand how other subsystems with different characteristics behave with this.