On Mon, Jul 08, 2019 at 10:11:29PM +0200, Dag Moxnes wrote: > > > Den 08.07.2019 21:38, skrev Jason Gunthorpe: > > On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote: > > > > > > On 7/8/19 11:47 AM, Dag Moxnes wrote: > > > > Thanks Jason, > > > > > > > > Regards, > > > > Dag > > > > > > > > Den 08.07.2019 19:50, skrev Jason Gunthorpe: > > > > > On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: > > > > > > Use neighbour lock when copying MAC address from neighbour data struct > > > > > > in dst_fetch_ha. > > > > > > > > > > > > When not using the lock, it is possible for the function to race with > > > > > > neigh_update, causing it to copy an invalid MAC address. > > > > > > > > > > > > It is possible to provoke this error by calling rdma_resolve_addr in a > > > > > > tight loop, while deleting the corresponding ARP entry in another tight > > > > > > loop. > > > > > > > > > > > > This will cause the race shown it the following sample trace: > > > > > > > > > > > > rdma_resolve_addr() > > > > > > rdma_resolve_ip() > > > > > > addr_resolve() > > > > > > addr_resolve_neigh() > > > > > > fetch_ha() > > > > > > dst_fetch_ha() > > > > > > n->nud_state == NUD_VALID > > > > > It isn't nud_state that is the problem here, it is the parallel > > > > > memcpy's onto ha. I fixed the commit message > > > > > > > > > > This could also have been solved by using the ha_lock, but I don't > > > > > think we have a reason to particularly over-optimize this. > > > Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()? > > Yes, that is much better, please respin this > OK, will do! > Can I still post it as a v4? Or should I do it differently as you already > applied it? post a v4 Jason