Hi Jason, On Fri, 2022-06-24 at 21:10 -0300, Jason Gunthorpe wrote: > On Fri, Jun 24, 2022 at 05:03:23PM -0700, Gerd Rausch wrote: [...] > > This in "dst_fetch_ha"?: > > --------%<--------%<--------%<--------%<--------%<-------- > > if (!(n->nud_state & NUD_VALID)) { > > neigh_event_send(n, NULL); > > ret = -ENODATA; > > --------%<--------%<--------%<--------%<--------%<-------- > > > > With: > > #define NUD_VALID (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE > > > NUD_STALE|NUD_DELAY) > > > > and the knowledge that the ARP entry is NUD_STALE, > > how would that function be called and perform the necessary refresh? > > > > > > > I think there is more going on here than this commit message is > > > explaining. > > > > > > > If the intention of the above mentioned "neigh_event_send" was to > > refresh stale entries, then the "if" condition ought to change, no? > > Maybe yes. > > If you are saying with this patch that neigh_event_send() should be > called unconditionally for every 'packet' why not remove the test > above instead of calling it twice? > It looks like starting from the very first time when "neigh_event_send" was introduced here: 935ef2d7a291 RDMA/cma: Use neigh_event_send() to start neighbour discovery it was always strictly coupled to the "-ENODATA" case. In other words, I don't see how the STALE case was covered, and I'd have to guess wether the -ENODATA coupling was intentional or accidental. Now it may be perfectly possible to just make the "neigh_event_send" call above unconditional and call it a day. I mean, simpler fixes always win over more complex ones. But, I personally don't know the twists & turns of this code well enough to be able to assert that this won't leave any non-covered conditional corner cases. I certainly hadn't tested that. > On the other hand I see many places checking for NUD_VALID before > calling it. > > So, really the commit message needs to explain how neigh_event_send() > is supposed to be used and explain that NUD_VALID is OK to drop. > Some places do check for NUD_VALID (e.g. "__ip_do_redirect"), many others don't (e.g. "__teql_resolve", or "neigh_resolve_output" itself). Even in the case of "dst_fetch_ha": I can not state whether the check for !NUD_VALUD was done intentionally or not. I can observe though that the side-effect of that check is that STALE entries won't get refreshed. So something is clearly off here. Let's look "__ip_do_redirect": I would have to guess why that check for !NUD_VALID is there. It's been there as far back as the GIT history goes. But I prefer not having to put guess-work into my Commit-Log. That said, I don't mind putting guess-work in this e-mail though: Perhaps the author(s) knew that the code would have to go through "ip_finish_output2" (or whatever the predecessor's name was) eventually, and therefore it was okay to kick off a refresh only if !NUD_VALID. At the end of the day, if the desired behavior is to always refresh STALE entries, we should do so. Calling "neigh_event_send" for !NUD_VALID only won't achieve that. > I'm having a hard time guessing from a quick look around. > Thanks for looking though, Gerd