On Thu, Jan 06, 2011 at 11:22:42AM +0100, Eric Dumazet wrote: > Le jeudi 06 janvier 2011 Ã 18:33 +0900, Simon Horman a Ãcrit : > > Hi, > > > > Back in October I reported that I noticed a problem whereby flow control > > breaks down when openvswitch is configured to mirror a port[1]. > > > > I have (finally) looked into this further and the problem appears to relate > > to cloning of skbs, as Jesse Gross originally suspected. > > > > More specifically, in do_execute_actions[2] the first n-1 times that an skb > > needs to be transmitted it is cloned first and the final time the original > > skb is used. > > > > In the case that there is only one action, which is the normal case, then > > the original skb will be used. But in the case of mirroring the cloning > > comes into effect. And in my case the cloned skb seems to go to the (slow) > > eth1 interface while the original skb goes to the (fast) dummy0 interface > > that I set up to be a mirror. The result is that dummy0 "paces" the flow, > > and its a cracking pace at that. > > > > As an experiment I hacked do_execute_actions() to use the original skb > > for the first action instead of the last one. In my case the result was > > that eth1 "paces" the flow, and things work reasonably nicely. > > > > Well, sort of. Things work well for non-GSO skbs but extremely poorly for > > GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running > > netserv. I'm unsure why, but I digress. > > > > It seems to me that my hack illustrates the point that the flow ends up > > being "paced" by one interface. However I think that what would be > > desirable is that the flow is "paced" by the slowest link. Unfortunately > > I'm unsure how to achieve that. > > > > Hi Simon ! > > "pacing" is done because skb is attached to a socket, and a socket has a > limited (but configurable) sndbuf. sk->sk_wmem_alloc is the current sum > of all truesize skbs in flight. > > When you enter something that : > > 1) Get a clone of the skb, queue the clone to device X > 2) queue the original skb to device Y > > Then : Socket sndbuf is not affected at all by device X queue. > This is speed on device Y that matters. > > You want to get servo control on both X and Y > > You could try to > > 1) Get a clone of skb > Attach it to socket too (so that socket get a feedback of final > orphaning for the clone) with skb_set_owner_w() > queue the clone to device X > > Unfortunatly, stacked skb->destructor() makes this possible only for > known destructor (aka sock_wfree()) Hi Eric ! Thanks for the advice. I had thought about the socket buffer but at some point it slipped my mind. In any case the following patch seems to implement the change that I had in mind. However my discussions Michael Tsirkin elsewhere in this thread are beginning to make me think that think that perhaps this change isn't the best solution. diff --git a/datapath/actions.c b/datapath/actions.c index 5e16143..505f13f 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, for (a = actions, rem = actions_len; rem > 0; a = nla_next(a, &rem)) { if (prev_port != -1) { - do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port); + struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); + if (nskb) { + if (skb->sk) + skb_set_owner_w(nskb, skb->sk); + do_output(dp, nskb, prev_port); + } prev_port = -1; } I got a rather nasty panic without the if (skb->sk), I guess some skbs don't have a socket. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html