Re: [PATCH net-next] net: introduce SO_INCOMING_CPU

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Sat, 15 Nov 2014 10:47:02 -0800

On Sat, Nov 15, 2014 at 10:41 AM, Tom Herbert <therbert@xxxxxxxxxx> wrote:
> On Fri, Nov 14, 2014 at 4:50 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Fri, Nov 14, 2014 at 4:40 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote:
>>> On Fri, Nov 14, 2014 at 4:24 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> On Fri, Nov 14, 2014 at 4:06 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote:
>>>>> On Fri, Nov 14, 2014 at 2:10 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>> On Fri, Nov 14, 2014 at 1:36 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote:
>>>>>>> On Fri, Nov 14, 2014 at 12:34 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>>>> On Fri, Nov 14, 2014 at 12:25 PM, Tom Herbert <therbert@xxxxxxxxxx> wrote:
>>>>>>>>> On Fri, Nov 14, 2014 at 12:16 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>>>>>> On Fri, Nov 14, 2014 at 11:52 AM, Tom Herbert <therbert@xxxxxxxxxx> wrote:
>>>>>>>>>>> On Fri, Nov 14, 2014 at 11:33 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>>>>>>>>>>>> On Fri, 2014-11-14 at 09:17 -0800, Andy Lutomirski wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> As a heavy user of RFS (and finder of bugs in it, too), here's my
>>>>>>>>>>>>> question about this API:
>>>>>>>>>>>>>
>>>>>>>>>>>>> How does an application tell whether the socket represents a
>>>>>>>>>>>>> non-actively-steered flow?  If the flow is subject to RFS, then moving
>>>>>>>>>>>>> the application handling to the socket's CPU seems problematic, as the
>>>>>>>>>>>>> socket's CPU might move as well.  The current implementation in this
>>>>>>>>>>>>> patch seems to tell me which CPU the most recent packet came in on,
>>>>>>>>>>>>> which is not necessarily very useful.
>>>>>>>>>>>>
>>>>>>>>>>>> Its the cpu that hit the TCP stack, bringing dozens of cache lines in
>>>>>>>>>>>> its cache. This is all that matters,
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Some possibilities:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Let SO_INCOMING_CPU fail if RFS or RPS are in play.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, idea is to not use RFS at all. Otherwise, it is useless.
>>>>>>>>>>
>>>>>>>>>> Sure, but how do I know that it'll be the same CPU next time?
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Bear in mind this is only an interface to report RX CPU and in itself
>>>>>>>>>>> doesn't provide any functionality for changing scheduling, there is
>>>>>>>>>>> obviously logic needed in user space that would need to do something.
>>>>>>>>>>>
>>>>>>>>>>> If we track the interrupting CPU in skb, the interface could be easily
>>>>>>>>>>> extended to provide the interrupting CPU, the RPS CPU (calculated at
>>>>>>>>>>> reported time), and the CPU processing transport (post steering which
>>>>>>>>>>> is what is currently returned). That would provide the complete
>>>>>>>>>>> picture to control scheduling a flow from userspace, and an interface
>>>>>>>>>>> to selectively turn off RFS for a socket would make sense then.
>>>>>>>>>>
>>>>>>>>>> I think that a turn-off-RFS interface would also want a way to figure
>>>>>>>>>> out where the flow would go without RFS.  Can the network stack do
>>>>>>>>>> that (e.g. evaluate the rx indirection hash or whatever happens these
>>>>>>>>>> days)?
>>>>>>>>>>
>>>>>>>>> Yes,. We need the rxhash and the CPU that packets are received on from
>>>>>>>>> the device for the socket. The former we already have, the latter
>>>>>>>>> might be done by adding a field to skbuff to set received CPU. Given
>>>>>>>>> the L4 hash and interrupting CPU we can calculated the RPS CPU which
>>>>>>>>> is where packet would have landed with RFS off.
>>>>>>>>
>>>>>>>> Hmm.  I think this would be useful for me.  It would *definitely* be
>>>>>>>> useful for me if I could pin an RFS flow to a cpu of my choice.
>>>>>>>>
>>>>>>> Andy, can you elaborate a little more on your use case. I've thought
>>>>>>> several times about an interface to program the flow table from
>>>>>>> userspace, but never quite came up with a compelling use case and
>>>>>>> there is the security concern that a user could "steal" cycles from
>>>>>>> arbitrary CPUs.
>>>>>>
>>>>>> I have a bunch of threads that are pinned to various CPUs or groups of
>>>>>> CPUs.  Each thread is responsible for a fixed set of flows.  I'd like
>>>>>> those flows to go to those CPUs.
>>>>>>
>>>>>> RFS will eventually do it, but it would be nice if I could
>>>>>> deterministically ask for a flow to be routed to the right CPU.  Also,
>>>>>> if my thread bounces temporarily to another CPU, I don't really need
>>>>>> the flow to follow it -- I'd like it to stay put.
>>>>>>
>>>>> Okay, how about it we have a SO_RFS_LOCK_FLOW sockopt. When this is
>>>>> called on a socket we can lock the socket to CPU binding to the
>>>>> current CPU it is called from. It could be unlocked at a later point.
>>>>> Would this satisfy your requirements?
>>>>
>>>> Yes, I think.  Especially if it bypassed the hash table.
>>>
>>> Unfortunately we can't easily bypass the hash table. The only way I
>>> know of to to do that is to perform the socket lookup to do steering
>>> (I tried that early on, but it was pretty costly).
>>
>> What happens if you just call ndo_rx_flow_steer and do something to
>> keep the result from expiring?
>>
> Okay, I will look at that. Do you know how many flows we are talking
> about, both in the number you need and the number that can be put in
> the HW without collision?

I think I have on the order of 100 flows.  Maybe 400.  It's been a few
months since I checked, and this particular metric is much easier to
measure on a weekday.

I think the HW has several thousand slots.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html