Re: Lightweight packet timestamping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/16/20 10:00 AM, Jesper Dangaard Brouer wrote:
> On Wed, 10 Jun 2020 23:09:34 +0200
> Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> 
>> Federico Parola <fede.parola@xxxxxxxxxx> writes:
>>
>>> On 06/06/20 01:34, David Ahern wrote:  
>>>> On 6/4/20 7:30 AM, Federico Parola wrote:  
>>>>> Hello everybody,
>>
>>>>> I'm implementing a token bucket algorithm to apply rate limit to
>>>>> traffic and I need the timestamp of packets to update the bucket.
>>>>> To get this information I'm using the bpf_ktime_get_ns() helper
>>>>> but I've discovered it has a non negligible impact on
>>>>> performance. I've seen there is work in progress to make hardware
>>>>> timestamps available to XDP programs, but I don't know if this
>>>>> feature is already available. Is there a faster way to retrieve
>>>>> this information?
>>
>>>>> Thanks for your attention.
>>>>>  
>>>> bpf_ktime_get_ns should be fairly light. What kind of performance loss
>>>> are you seeing with it?  
>>>
>>> I've run some tests on a program forwarding packets between two 
>>> interfaces and applying rate limit: using the bpf_ktime_get_ns() I can 
>>> process up to 3.84 Mpps, if I replace the helper with a lookup on a map 
>>> containing the current timestamp updated in user space I go up to 4.48
>>> Mpps.
> 
> ((1/3.84*1000)-(1/4.48*1000) = 37.20 ns overhead)

I had the same math yesterday and did some tests as well. I am really
surprised the timestamp is that high.

> 
> I was about to suggest doing something close to this.  That is, only call
> bpf_ktime_get_ns() once per NAPI poll-cycle, and store the timestamp in
> a map.  If you don't need super high per packet precision.  You can
> even use a per-CPU map to store the info (to avoid cross CPU
> cache/talk), because softirq will keep RX-processing pinned to a CPU.
> 
> It sounds like you update the timestamp from userspace, is that true?
> (Quote: "current timestamp updated in user space")
> 
> I would suggest that you can leverage the softirq tracepoints (use
> SEC("raw_tracepoint/") for low overhead).  E.g. irq:softirq_entry
> (see when kernel calls trace_softirq_entry) to update the map once per
> NAPI/net_rx_action. I have a bpftrace based-tool[1] that measure

I have code that measures the overhead of net_rx_action:
    https://github.com/dsahern/bpf-progs/blob/master/ksrc/net_rx_action.c

this use case would just need the enter probe.


> network-softirq latency, e.g time it takes from "softirq_raise" until
> it is run "softirq_entry".  You can leverage ideas from that script,
> like 'vec == 3' is NET_RX_SOFTIRQ to limit this to networking.
> 
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt
> 
>> Can you share more details on the platform you're running this on?
>> I.e., CPU and chipset details, network driver, etc.
> 
> Yes, please.  I plan to work on XDP-feature of extracting hardware
> offload-info from the drivers descriptor, like timestamps, vlan,
> rss-hash, checksum, etc.  If you tell me what NIC driver you are using,
> I could make sure to include that in the supported drivers.
> 




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux