On Wed, Apr 26, 2017 at 10:25:04AM -0400, Doug Ledford wrote: > On Wed, 2017-04-26 at 22:11 +0800, Honggang LI wrote: > > On Wed, Apr 26, 2017 at 09:50:38AM -0400, Doug Ledford wrote: > > > > > > On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote: > > > > > > > > On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote: > > > > > > > > > > > > > > > Yes, it is during the process of removing the final slave. The > > > > > reproducer looks like this: > > > > > > > > > > ping remote_ip_over_bonding_interface & > > > > > while 1; do > > > > > ifdown bond0 > > > > > ifup bond0 > > > > > done > > > > > > BTW, rerunning your test as: > > > > > > ping remote_ip_over_bonding_interface & > > > while 1; do > > > echo -n "Downing interface..." > > > ifdown bond0 > > > > Confirmed panic in here. > > OK, this leads me to suspect that the bonding driver is possibly > reconfiguring the skb setup to Ethernet mode before the last slave is > fully dropped from the bond, and so the slave is seeing a packet to its > hard header routine with the wrong headroom. I think Paolo's patch is > probably the best way to go. The reason I say that is because we can't > definitively say that this is the only pathway we will ever see that > causes us to get a packet with Ethernet headroom on our IPoIB > interface, and Paolo's patch gives us the possibility of recovering and > sending the packet out. Even in this case, we are downing the > interface, but it isn't down entirely yet, so sending the packet out Sending packet out over a removing interface looks bad. I'd like to drop backlog packet. > isn't necessarily wrong. So a method of fix that allows us to > recover/continue seems preferable to me. Can you test Paolo's patch > and see if it resolves the problem? Yes, Paolo's patch fixes the panic issue too. But it seems has side effort. For example: 520 struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, ..... 538 return NULL; 539 540 skb_reserve(skb, hlen); 541 skb_reset_network_header(skb); <-- it update skb->network_header But skb_cow_head seems does not touch skb->network_header. I have to say I did not test the network_header. > > -- > Doug Ledford <dledford@xxxxxxxxxx> > GPG KeyID: B826A3330E572FDD > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html