On Sat, Nov 15, 2014 at 10:41 AM, Tom Herbert <therbert@xxxxxxxxxx> wrote: > On Fri, Nov 14, 2014 at 4:50 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> On Fri, Nov 14, 2014 at 4:40 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote: >>> On Fri, Nov 14, 2014 at 4:24 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>> On Fri, Nov 14, 2014 at 4:06 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote: >>>>> On Fri, Nov 14, 2014 at 2:10 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>> On Fri, Nov 14, 2014 at 1:36 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote: >>>>>>> On Fri, Nov 14, 2014 at 12:34 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>>>> On Fri, Nov 14, 2014 at 12:25 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote: >>>>>>>>> On Fri, Nov 14, 2014 at 12:16 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>>>>>>>>> On Fri, Nov 14, 2014 at 11:52 AM, Tom Herbert <therbert@xxxxxxxxxx> wrote: >>>>>>>>>>> On Fri, Nov 14, 2014 at 11:33 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: >>>>>>>>>>>> On Fri, 2014-11-14 at 09:17 -0800, Andy Lutomirski wrote: >>>>>>>>>>>> >>>>>>>>>>>>> As a heavy user of RFS (and finder of bugs in it, too), here's my >>>>>>>>>>>>> question about this API: >>>>>>>>>>>>> >>>>>>>>>>>>> How does an application tell whether the socket represents a >>>>>>>>>>>>> non-actively-steered flow? If the flow is subject to RFS, then moving >>>>>>>>>>>>> the application handling to the socket's CPU seems problematic, as the >>>>>>>>>>>>> socket's CPU might move as well. The current implementation in this >>>>>>>>>>>>> patch seems to tell me which CPU the most recent packet came in on, >>>>>>>>>>>>> which is not necessarily very useful. >>>>>>>>>>>> >>>>>>>>>>>> Its the cpu that hit the TCP stack, bringing dozens of cache lines in >>>>>>>>>>>> its cache. This is all that matters, >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Some possibilities: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Let SO_INCOMING_CPU fail if RFS or RPS are in play. >>>>>>>>>>>> >>>>>>>>>>>> Well, idea is to not use RFS at all. Otherwise, it is useless. >>>>>>>>>> >>>>>>>>>> Sure, but how do I know that it'll be the same CPU next time? >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Bear in mind this is only an interface to report RX CPU and in itself >>>>>>>>>>> doesn't provide any functionality for changing scheduling, there is >>>>>>>>>>> obviously logic needed in user space that would need to do something. >>>>>>>>>>> >>>>>>>>>>> If we track the interrupting CPU in skb, the interface could be easily >>>>>>>>>>> extended to provide the interrupting CPU, the RPS CPU (calculated at >>>>>>>>>>> reported time), and the CPU processing transport (post steering which >>>>>>>>>>> is what is currently returned). That would provide the complete >>>>>>>>>>> picture to control scheduling a flow from userspace, and an interface >>>>>>>>>>> to selectively turn off RFS for a socket would make sense then. >>>>>>>>>> >>>>>>>>>> I think that a turn-off-RFS interface would also want a way to figure >>>>>>>>>> out where the flow would go without RFS. Can the network stack do >>>>>>>>>> that (e.g. evaluate the rx indirection hash or whatever happens these >>>>>>>>>> days)? >>>>>>>>>> >>>>>>>>> Yes,. We need the rxhash and the CPU that packets are received on from >>>>>>>>> the device for the socket. The former we already have, the latter >>>>>>>>> might be done by adding a field to skbuff to set received CPU. Given >>>>>>>>> the L4 hash and interrupting CPU we can calculated the RPS CPU which >>>>>>>>> is where packet would have landed with RFS off. >>>>>>>> >>>>>>>> Hmm. I think this would be useful for me. It would *definitely* be >>>>>>>> useful for me if I could pin an RFS flow to a cpu of my choice. >>>>>>>> >>>>>>> Andy, can you elaborate a little more on your use case. I've thought >>>>>>> several times about an interface to program the flow table from >>>>>>> userspace, but never quite came up with a compelling use case and >>>>>>> there is the security concern that a user could "steal" cycles from >>>>>>> arbitrary CPUs. >>>>>> >>>>>> I have a bunch of threads that are pinned to various CPUs or groups of >>>>>> CPUs. Each thread is responsible for a fixed set of flows. I'd like >>>>>> those flows to go to those CPUs. >>>>>> >>>>>> RFS will eventually do it, but it would be nice if I could >>>>>> deterministically ask for a flow to be routed to the right CPU. Also, >>>>>> if my thread bounces temporarily to another CPU, I don't really need >>>>>> the flow to follow it -- I'd like it to stay put. >>>>>> >>>>> Okay, how about it we have a SO_RFS_LOCK_FLOW sockopt. When this is >>>>> called on a socket we can lock the socket to CPU binding to the >>>>> current CPU it is called from. It could be unlocked at a later point. >>>>> Would this satisfy your requirements? >>>> >>>> Yes, I think. Especially if it bypassed the hash table. >>> >>> Unfortunately we can't easily bypass the hash table. The only way I >>> know of to to do that is to perform the socket lookup to do steering >>> (I tried that early on, but it was pretty costly). >> >> What happens if you just call ndo_rx_flow_steer and do something to >> keep the result from expiring? >> > Okay, I will look at that. Do you know how many flows we are talking > about, both in the number you need and the number that can be put in > the HW without collision? I think I have on the order of 100 flows. Maybe 400. It's been a few months since I checked, and this particular metric is much easier to measure on a weekday. I think the HW has several thousand slots. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html