On Thu, Mar 15, 2018 at 6:08 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Thu, Mar 15, 2018 at 05:06:08PM +0200, Erez Shitrit wrote: > >> The origin issue was that after changing to CM mode traffic might >> stopped for very long time (depends of the arp time, at least 30 sec). >> Now, if we move the line after the ipoib_flush_paths() call, the >> problem is much smaller: >> only while ipoib_flush_paths() runs, packet that sent to CM >> connection after packet from UD/GSO will be dropped. > > sure, that is prett clear, and the above would be a much better commit > message. > >> The question is does this something that we really need to handle? >> the error flow in CM mode already does this (and for other more often >> error flows like RNR etc.) >> and also we are talking about a case that is unlikely in the real life >> of ipoib driver, mode changing is something that done once at the >> beginning. should we add a code for this rare case? > > If you don't want to deal with it then a big comment is needed here to > explain that it is racey but we don't care for XYZ reasons. > > And only do that if fixing it properly is really hard.. but it doesn't > look too hard.. > > What about something like this.. I didn't check too carefully but it > seems to hold the right locks: > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index 161ba8c76285cb..bdf7a723f52818 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -711,7 +711,7 @@ static void push_pseudo_header(struct sk_buff *skb, const char *daddr) > memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN); > } > > -void ipoib_flush_paths(struct net_device *dev) > +void ipoib_flush_paths(struct net_device *dev, bool cm_mode) > { > struct ipoib_dev_priv *priv = ipoib_priv(dev); > struct ipoib_path *path, *tp; > @@ -737,6 +737,20 @@ void ipoib_flush_paths(struct net_device *dev) > spin_lock_irqsave(&priv->lock, flags); > } > > + /* > + * Mode change is done atomically with list flushing, we stay in the > + * old mode until the list is empty then atomically switch modes for > + * new neighs from then on. > + */ > + if (cm_mode) { > + priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM; > + priv->tx_wr.wr.opcode = IB_WR_SEND; > + set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); > + } else { > + // .. fixme set tx_wr properly for !CM mode > + clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); > + } > + This will work. But, IMHO seems to me that this is not the right place, the function is used for other flows, like taking the device down, etc. place where the mode is not relevant. As you suggested I will add a detailed explain for the possible race and why we can leave with it. > spin_unlock_irqrestore(&priv->lock, flags); > netif_tx_unlock_bh(dev); > } > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html