On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote: > > > On 7/8/19 11:47 AM, Dag Moxnes wrote: > > Thanks Jason, > > > > Regards, > > Dag > > > > Den 08.07.2019 19:50, skrev Jason Gunthorpe: > >> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: > >>> Use neighbour lock when copying MAC address from neighbour data struct > >>> in dst_fetch_ha. > >>> > >>> When not using the lock, it is possible for the function to race with > >>> neigh_update, causing it to copy an invalid MAC address. > >>> > >>> It is possible to provoke this error by calling rdma_resolve_addr in a > >>> tight loop, while deleting the corresponding ARP entry in another tight > >>> loop. > >>> > >>> This will cause the race shown it the following sample trace: > >>> > >>> rdma_resolve_addr() > >>> rdma_resolve_ip() > >>> addr_resolve() > >>> addr_resolve_neigh() > >>> fetch_ha() > >>> dst_fetch_ha() > >>> n->nud_state == NUD_VALID > >> It isn't nud_state that is the problem here, it is the parallel > >> memcpy's onto ha. I fixed the commit message > >> > >> This could also have been solved by using the ha_lock, but I don't > >> think we have a reason to particularly over-optimize this. > > Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()? Yes, that is much better, please respin this Thanks, Jason