On 1/8/2025 10:03 AM, He X wrote:
> Ok, that's important and perhaps this needs more digging. What was
your setup? Was it an iWARP connection, for example?
Direct connection between two mlx5_ib, ROCE network.
> If IRD/ORD is the problem, you'll see connections break when write-heavy
workloads are present. Is that what you mean by "did not work"?
Yes. Only disconnect when copying large files from clients(cifs) to
ksmbd. I do see some retrying in logs, but it is not able to recover.
I have cleared my testing logs, so I can not paste it here.
Ok. The interesting item would be the work request completion status
that preceded the connection failure, or the async error upcall event
from the rdma driver if that triggered first. Both client and server
logs are needed. And it can be a higher-level issue too, there were
some signing issues related to the fscache changes, these might be
in kernel 6.12. I tested mostly successfully at SDC in September with
them, anyway.
There may well be something else going on - RoCE can be very tricky
to set up since it depends on link layer flow control. You're not
using RoCEv2?
BTW the code does have some strange-looking defaults between client
and server IRD/ORD queue depths. The server defaults to 8 ORD, while
the client defaults to 32 IRD. This is odd, but not in itself fatal.
After all, other implementations (e.g. Windows) have their own defaults
too. The negotiation at both RDMA and SMB Direct should align them.
> Again "many"?
I mean the quote `In practice, many RDMA providers set the rd_atom and
rd_init_atom to the same value`.
Other protocols may make different choices. Not this one.
Got. I'll do some more tests to see if I can find out the problem.
Thanks for your patience!
Great, looking forward to that.
Tom.
Tom Talpey <tom@xxxxxxxxxx <mailto:tom@xxxxxxxxxx>> 于2025年1月8日周三
21:58写道:
On 1/7/2025 10:19 PM, He X wrote:
> Thanks for your review!
>
> By man page, I mean rdma_xxx man pages like https://
linux.die.net/man/3/ <https://linux.die.net/man/3/>
> rdma_connect <https://linux.die.net/man/3/rdma_connect <https://
linux.die.net/man/3/rdma_connect>>. I do mean ORD
> or IRD, just bad wording.
Ok, that's the user verb API, we're in the kernel here. Some things are
similar, but not all.
> In short, RDMA on my setup did not work. While I am digging
around, I
Ok, that's important and perhaps this needs more digging. What was
your setup? Was it an iWARP connection, for example? The iWARP protocol
is stricter than IB for IRD, because it does not support "retry" when
there are insufficient resources. This is a Good Thing, by the way,
it avoids silly tail latencies. But it can cause sloppy upper layer
code to break.
If IRD/ORD is the problem, you'll see connections break when write-heavy
workloads are present. Is that what you mean by "did not work"?
> noticed that `initiator_depth` is generally set to `min(xxx,
> max_qp_init_rd_atom)` in the kernel source code. I am not aware
of that
> ksmbd direct did not use IRD. And many clients set them to the
same value.
Again "many"? Please be specific. Clients implement protocols, and
protocols have differing requirements. An SMB3 client should advertise
an ORD == 0, and should offer at least a small IRD > 0.
An SMB3 server will do the converse - an IRD == 0 at all times, and an
ORD > 0 in response to the client's offered IRD. The resulting limits
are exchanged in the SMB Direct negotiation packets. The IRD==0 is what
you see in the very next line after your change:
>> conn_param.responder_resources = 0;
Other protocols may make different choices. Not this one.
Tom.
>
> FYI, here is the original discussion on github https://
github.com/ <https://github.com/>
> namjaejeon/ksmbd/issues/497 <https://github.com/namjaejeon/ksmbd/
<https://github.com/namjaejeon/ksmbd/>
> issues/497>.
>
> Tom Talpey <tom@xxxxxxxxxx <mailto:tom@xxxxxxxxxx>
<mailto:tom@xxxxxxxxxx <mailto:tom@xxxxxxxxxx>>> 于2025年1月8日周三
> 05:04写道:
>
> On 1/5/2025 10:39 PM, He Wang wrote:
> > Field `initiator_depth` is for incoming request.
> >
> > According to the man page, `max_qp_rd_atom` is the maximum
number of
> > outstanding packaets, and `max_qp_init_rd_atom` is the maximum
> depth of
> > incoming requests.
>
> I do not believe this is correct, what "man page" are you
referring to?
> The commit message is definitely wrong. Neither value is
referring to
> generic "maximum packets" nor "incoming requests".
>
> The max_qp_rd_atom is the "ORD" or outgoing read/atomic
request depth.
> The ksmbd server uses this to control RDMA Read requests to
fetch data
> from the client for certain SMB3_WRITE operations. (SMB
Direct does not
> use atomics)
>
> The max_qp_init_rd_atom is the "IRD" or incoming read/atomic
request
> depth. The SMB3 protocol does not allow clients to request
data from
> servers via RDMA Read. This is absolutely by design, and the
server
> therefore does not use this value.
>
> In practice, many RDMA providers set the rd_atom and
rd_init_atom to
> the same value, but this change would appear to break SMB
Direct write
> functionality when operating over providers that do not.
>
> So, NAK.
>
> Namjae, you should revert your upstream commit.
>
> Tom.
>
> >
> > Signed-off-by: He Wang <xw897002528@xxxxxxxxx
<mailto:xw897002528@xxxxxxxxx>
> <mailto:xw897002528@xxxxxxxxx <mailto:xw897002528@xxxxxxxxx>>>
> > ---
> > fs/smb/server/transport_rdma.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/
> transport_rdma.c
> > index 0ef3c9f0b..c6dbbbb32 100644
> > --- a/fs/smb/server/transport_rdma.c
> > +++ b/fs/smb/server/transport_rdma.c
> > @@ -1640,7 +1640,7 @@ static int
smb_direct_accept_client(struct
> smb_direct_transport *t)
> > int ret;
> >
> > memset(&conn_param, 0, sizeof(conn_param));
> > - conn_param.initiator_depth = min_t(u8, t->cm_id->device-
> >attrs.max_qp_rd_atom,
> > + conn_param.initiator_depth = min_t(u8, t->cm_id->device-
> >attrs.max_qp_init_rd_atom,
> >
> SMB_DIRECT_CM_INITIATOR_DEPTH);
> > conn_param.responder_resources = 0;
> >
>
>
>
> --
> Best regards,
> xhe
--
Best regards,
xhe