Re: [PATCH net v1] net/smc: avoid data corruption caused by decline

Wenjia Zhang <wenjia@xxxxxxxxxxxxx> · Mon, 13 Nov 2023 11:57:49 +0100

On 13.11.23 03:50, D. Wythe wrote:


On 11/10/23 10:51 AM, D. Wythe wrote:


On 11/8/23 9:00 PM, Wenjia Zhang wrote:


On 08.11.23 10:48, D. Wythe wrote:
From: "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx>

We found a data corruption issue during testing of SMC-R on Redis
applications.

The benchmark has a low probability of reporting a strange error as
shown below.

"Error: Protocol error, got "\xe2" as reply type byte"

Finally, we found that the retrieved error data was as follows:

0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2

It is quite obvious that this is a SMC DECLINE message, which means 
that
the applications received SMC protocol message.
We found that this was caused by the following situations:

client            server
       proposal
    ------------->
       accept
    <-------------
       confirm
    ------------->
wait confirm

     failed llc confirm
        x------
(after 2s)timeout
            wait rsp

wait decline

(after 1s) timeout
            (after 2s) timeout
        decline
    -------------->
        decline
    <--------------

As a result, a decline message was sent in the implementation, and this
message was read from TCP by the already-fallback connection.

This patch double the client timeout as 2x of the server value,
With this simple change, the Decline messages should never cross or
collide (during Confirm link timeout).

This issue requires an immediate solution, since the protocol updates
involve a more long-term solution.

Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC 
flow")
Signed-off-by: D. Wythe <alibuda@xxxxxxxxxxxxxxxxx>
---
  net/smc/af_smc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index abd2667..5b91f55 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -599,7 +599,7 @@ static int smcr_clnt_conf_first_link(struct 
smc_sock *smc)
      int rc;
        /* receive CONFIRM LINK request from server over RoCE fabric */
-    qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME,
+    qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME,
                    SMC_LLC_CONFIRM_LINK);
      if (!qentry) {
          struct smc_clc_msg_decline dclc;
I'm wondering if the double time (if sufficient) of timeout could be 
for waiting for CLC_DECLINE on the client's side. i.e.


It depends. We can indeed introduce a sysctl to allow server to 
manager their Confirm Link timeout,
but if there will be protocol updates, this introduction will no 
longer be necessary, and we will
have to maintain it continuously.

no, I don't think, either, that we need a sysctl for that.
I believe the core of the solution is to ensure that decline messages 
never cross or collide. Increasing
the client's timeout by twice as much as the server's timeout can 
temporarily solve this problem.

I have no objection with that, but my question is why you don't increase 
the timeout waiting for CLC_DECLINE instead of waiting LLC_Confirm_Link? 
Shouldn't they have the same effect?

If Jerry's proposed protocol updates are too complex or if there won't 
be any future protocol updates,
it's still not late to let server manager their Confirm Link timeout 
then.

Best wishes,
D. Wythe


FYI:

It seems that my email was not successfully delivered due to some 
reasons. Sorry
for that.

D. Wythe



diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 35ddebae8894..9b1feef1013d 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -605,7 +605,7 @@ static int smcr_clnt_conf_first_link(struct 
smc_sock *smc)
                struct smc_clc_msg_decline dclc;

                rc = smc_clc_wait_msg(smc, &dclc, sizeof(dclc),
-                                     SMC_CLC_DECLINE, 
CLC_WAIT_TIME_SHORT);
+                                     SMC_CLC_DECLINE, 2 * 
CLC_WAIT_TIME_SHORT);
                return rc == -EAGAIN ? SMC_CLC_DECL_TIMEOUT_CL : rc;
        }
        smc_llc_save_peer_uid(qentry);

Because the purpose is to let the server have the control to deline.