On Thu, Aug 5, 2021 at 9:44 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Thu, Aug 05, 2021 at 10:38:42AM -0400, Olga Kornievskaia wrote: > > Hi folks, > > > > Can somebody help me understand how RoCE (this is probably RDMA core > > and not specific to RoCE but I'm not sure) manages destination MAC > > addresses for its connection? > > > > Specifically the problem being observed is a server initiates an RDMA > > CM disconnect (client replies), client tries to reconnect. Server > > sends an ARP advertising a different MAC for the IP that the RDMA > > connection was using. RDMA code keeps sending the RDMA CM connect > > message to the old MAC for a certains period of time (90-100sec) then > > it finally sends it to the new MAC address. > > > > Question: how does the core RDMA layer manage the MAC address for the > > connection. Why does it seem like it ignores the ARP updates? > > RDMA objects acquire a MAC adress when they are created and do not > synchronize with the neighbor cache after. > > What you are seeing is that the CM_ID object holds the bad mac until > it is destoroyed and likely a new CM_ID object gets created that holds > the updated MAC First of all, thank you for the feedback. A few more questions on that. It sounds like you are agreeing with me that the ARP update is ignored. Question: do you think that's an acceptable/expected behaviour or it's a bug that needs to be fixed? Indeed the successful CM connect request has a different communication ID on the network trace. Question: is the period that the CMA would keep retrying before giving up a configuration option (by the caller of the connection or system in general)? Would tuning that value to be smaller so that it is more sensitive to ARP updates be the path forward? Thank you. > > Jason