> -----Original Message----- > From: Jason Gunthorpe > Sent: Wednesday, September 5, 2018 4:52 PM > To: Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Doug Ledford > <dledford@xxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA > mailing list <linux-rdma@xxxxxxxxxxxxxxx>; Daniel Jurgens > <danielj@xxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx>; Yuval Shaia > <yuval.shaia@xxxxxxxxxx> > Subject: Re: [PATCH rdma-next 01/11] RDMA/core Introduce and use > rdma_find_ndev_for_src_ip_rcu > > On Wed, Sep 05, 2018 at 09:36:43PM +0000, Ruhl, Michael J wrote: > > >From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > > >owner@xxxxxxxxxxxxxxx] On Behalf Of Leon Romanovsky > > >Sent: Wednesday, September 5, 2018 5:54 AM > > >To: Doug Ledford <dledford@xxxxxxxxxx>; Jason Gunthorpe > > ><jgg@xxxxxxxxxxxx> > > >Cc: Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux- > > >rdma@xxxxxxxxxxxxxxx>; Daniel Jurgens <danielj@xxxxxxxxxxxx>; Parav > > >Pandit <parav@xxxxxxxxxxxx>; Yuval Shaia <yuval.shaia@xxxxxxxxxx> > > >Subject: [PATCH rdma-next 01/11] RDMA/core Introduce and use > > >rdma_find_ndev_for_src_ip_rcu > > > > > >From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > > >Commit in fixes tag introduced two issues. > > >1. When address family is other than IPv4 or v6, rdma_translate_ip() > > >returns success which is incorrect. > > >2. When address familty is AF_INET6, and if the source address is not > > >found, it returns success, which is also incorrect. > > > > > >Therefore, introduce and use rdma_find_ndev_for_src_ip_rcu() helper > > >function which returns correct success or error status and is also > > >useful for future code refactor in addr_resolve(). > > > > > >Fixes: e08ce2e82b2f ("RDMA/core: Make function rdma_copy_addr return > > >void") > > >Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > >Reviewed-by: Daniel Jurgens <danielj@xxxxxxxxxxxx> > > >Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > >drivers/infiniband/core/addr.c | 61 > > >++++++++++++++++++++++++------------- > > > 1 file changed, 35 insertions(+), 26 deletions(-) > > > > > >diff --git a/drivers/infiniband/core/addr.c > > >b/drivers/infiniband/core/addr.c index 94ff38731be8..50ab50f1908b > > >100644 > > >+++ b/drivers/infiniband/core/addr.c > > >@@ -232,6 +232,36 @@ void rdma_copy_addr(struct rdma_dev_addr > > >*dev_addr, } EXPORT_SYMBOL(rdma_copy_addr); > > > > > >+static struct net_device * > > >+rdma_find_ndev_for_src_ip_rcu(struct net *net, const struct sockaddr > > >*src_in) > > >+{ > > >+ struct net_device *dev = NULL; > > >+ int ret = -EADDRNOTAVAIL; > > >+ > > >+ switch (src_in->sa_family) { > > >+ case AF_INET: > > >+ dev = __ip_dev_find(net, > > >+ ((const struct sockaddr_in *)src_in)- > > >>sin_addr.s_addr, > > >+ false); > > > > __ip_dev_find() does an rcu_read_lock()/rcu_read_unlock() sequence. > > > > Is taking it before calling rdma_find_ndev_for_src_ip_rcu() the right thing to > do? > > rcu is safe to nest.. > > But I can't quite remember why we needed to hold the rcu there across > rdma_copy_addr, as we do have the dev_hold at that point.. Maybe that use of > RCU is overly pessimistic? > > Parav? > I was replying but will continue in this thread now. > Is taking it before calling rdma_find_ndev_for_src_ip_rcu() the right thing to do? Likely because rcu_read_lock can be nested as per [1]. If we do rcu_read_lock() inside rdma_find_ndev_for_src_ip_rcu(), we need to dev_hold and dev_put() and handle code differently between ipv4 and v6. However, dev_hold() and dev_put() doesn't guarantee that netdev fields won't change such as ifindex. It just ensures that netdev exist. So assuming rcu_read_lock is inside rdma_find_ndev_for_src_ip_rcu, rdma_copy_src_l2_addr() again need to take rcu lock unlock while copying ifindex and in between netdev is free to change things. So while using nested rcu seems sub-optimal, it does offers benefit (a) to avoid such racy conditions (though very corner cases), (b) common code for ipv4, v6, (c) avoids dev_hold, dev_put. [1] https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt Snippet: It is possible to nest rcu_read_lock(), since reader-writer locks may be recursively acquired.