On Sun, Mar 18, 2018 at 11:52:33AM +0200, Erez Shitrit wrote: > On Thu, Mar 15, 2018 at 6:08 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Thu, Mar 15, 2018 at 05:06:08PM +0200, Erez Shitrit wrote: > > > >> The origin issue was that after changing to CM mode traffic might > >> stopped for very long time (depends of the arp time, at least 30 sec). > >> Now, if we move the line after the ipoib_flush_paths() call, the > >> problem is much smaller: > >> only while ipoib_flush_paths() runs, packet that sent to CM > >> connection after packet from UD/GSO will be dropped. > > > > sure, that is prett clear, and the above would be a much better commit > > message. > > > >> The question is does this something that we really need to handle? > >> the error flow in CM mode already does this (and for other more often > >> error flows like RNR etc.) > >> and also we are talking about a case that is unlikely in the real life > >> of ipoib driver, mode changing is something that done once at the > >> beginning. should we add a code for this rare case? > > > > If you don't want to deal with it then a big comment is needed here to > > explain that it is racey but we don't care for XYZ reasons. > > > > And only do that if fixing it properly is really hard.. but it doesn't > > look too hard.. > > > > What about something like this.. I didn't check too carefully but it > > seems to hold the right locks: > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > index 161ba8c76285cb..bdf7a723f52818 100644 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > @@ -711,7 +711,7 @@ static void push_pseudo_header(struct sk_buff *skb, const char *daddr) > > memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN); > > } > > > > -void ipoib_flush_paths(struct net_device *dev) > > +void ipoib_flush_paths(struct net_device *dev, bool cm_mode) > > { > > struct ipoib_dev_priv *priv = ipoib_priv(dev); > > struct ipoib_path *path, *tp; > > @@ -737,6 +737,20 @@ void ipoib_flush_paths(struct net_device *dev) > > spin_lock_irqsave(&priv->lock, flags); > > } > > > > + /* > > + * Mode change is done atomically with list flushing, we stay in the > > + * old mode until the list is empty then atomically switch modes for > > + * new neighs from then on. > > + */ > > + if (cm_mode) { > > + priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM; > > + priv->tx_wr.wr.opcode = IB_WR_SEND; > > + set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); > > + } else { > > + // .. fixme set tx_wr properly for !CM mode > > + clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); > > + } > > + > > This will work. > But, IMHO seems to me that this is not the right place, the function > is used for other flows, like taking the device down, etc. place where > the mode is not relevant. It *IS* the right place because it is the only place that is atomic with the list flush due to the locking. It clearly cnanot be pulled out of that locking region correctly. It seems fine to me to have the argument and default it to test_bit(IPOIB_FLAG_ADMIN_CM), maybe with a helper/wrapper to do it. Nothing about this is performance. > As you suggested I will add a detailed explain for the possible race > and why we can leave with it. Well, now that we have an actually correct solution that is trivial, let us just do it right please? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html