On 1/20/22 07:51, Guangguan Wang wrote:
This implement rq flow control in smc-r link layer. QPs
communicating without rq flow control, in the previous
version, may result in RNR (reveive not ready) error, which
means when sq sends a message to the remote qp, but the
remote qp's rq has no valid rq entities to receive the message.
In RNR condition, the rdma transport layer may retransmit
the messages again and again until the rq has any entities,
which may lower the performance, especially in heavy traffic.
Using credits to do rq flow control can avoid the occurrence
of RNR.
That's some truly substantial improvements!
But we need to be careful with protocol-level changes: There are other operating
systems like z/OS and AIX which have compatible implementations of SMC, too.
Changes like a reduction of connections per link group or usage of reserved
fields would need to be coordinated, and likely would have unwanted side-effects
even when used with older Linux kernel versions.
Changing the protocol is "expensive" insofar as it requires time to thoroughly
discuss the changes, perform compatibility tests, and so on.
So I would like to urge you to investigate alternative ways that do not require
protocol-level changes to address this scenario, e.g. by modifying the number of
completion queue elements, to see if this could yield similar results.
Thx!