Re: [PATCH net-next v2 06/12] net-timestamp: introduce TS_SCHED_OPT_CB to generate dev xmit timestamp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/15/24 6:24 PM, Jason Xing wrote:
On Wed, Oct 16, 2024 at 9:01 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:

On 10/11/24 9:06 PM, Jason Xing wrote:
From: Jason Xing <kernelxing@xxxxxxxxxxx>

Introduce BPF_SOCK_OPS_TS_SCHED_OPT_CB flag so that we can decide to
print timestamps when the skb just passes the dev layer.

Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
---
   include/uapi/linux/bpf.h       |  5 +++++
   net/core/skbuff.c              | 17 +++++++++++++++--
   tools/include/uapi/linux/bpf.h |  5 +++++
   3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 157e139ed6fc..3cf3c9c896c7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7019,6 +7019,11 @@ enum {
                                        * by the kernel or the
                                        * earlier bpf-progs.
                                        */
+     BPF_SOCK_OPS_TS_SCHED_OPT_CB,   /* Called when skb is passing through
+                                      * dev layer when SO_TIMESTAMPING
+                                      * feature is on. It indicates the
+                                      * recorded timestamp.
+                                      */
   };

   /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3a4110d0f983..16e7bdc1eacb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5632,8 +5632,21 @@ static void bpf_skb_tstamp_tx_output(struct sock *sk, int tstype)
               return;

       tp = tcp_sk(sk);
-     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG))
-             return;
+     if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)) {
+             struct timespec64 tstamp;
+             u32 cb_flag;
+
+             switch (tstype) {
+             case SCM_TSTAMP_SCHED:
+                     cb_flag = BPF_SOCK_OPS_TS_SCHED_OPT_CB;
+                     break;
+             default:
+                     return;
+             }
+
+             tstamp = ktime_to_timespec64(ktime_get_real());
+             tcp_call_bpf_2arg(sk, cb_flag, tstamp.tv_sec, tstamp.tv_nsec);

There is bpf_ktime_get_*() helper. The bpf prog can directly call the
bpf_ktime_get_* helper and use whatever clock it sees fit instead of enforcing
real clock here and doing an extra ktime_to_timespec64. Right now the
bpf_ktime_get_*() does not have real clock which I think it can be added.

In this way, there is no need to add tcp_call_bpf_*arg() to pass
timestamp to userspace, right? Let the bpf program implement it.

Now I wonder what information I should pass? Sorry for the lack of BPF
related knowledge :(

Just pass the cb_flag op in this case.

A bpf selftest is missing in this series to show how it is going to be used. Yes, there are existing socket API tests on time stamping but I believe this discussion has already shown some subtle differences that warrant a closer to real world bpf prog example first.



I think overall the tstamp reporting interface does not necessarily have to
follow the socket API. The bpf prog is running in the kernel. It could pass
other information to the bpf prog if it sees fit. e.g. the bpf prog could also
get the original transmitted tcp skb if it is useful.

Good to know that! But how the BPF program parses the skb by using
tcp_call_bpf_2arg() which only passes u32 parameters.

"struct skbuff *skb" has already been added to "struct bpf_sock_ops_kern". It is only assigned during the "BPF_SOCK_OPS_PARSE_*HDR_CB". It is not exposed directly to bpf prog but it could be. However, it may need to change some convert_ctx code in filter.c which I am not excited about. We haven't added convert_ctx changes for a while since it is the old way.

Together with the "u32 bpf_sock_ops_cb_flags;" change in patch 9 which is only for tcp_sock and other _CB flags are also tcp specific only. For now, I am not sure carrying this sockops to the future UDP support is desired.

Take a look at tcp_call_bpf(). It needs to initialize the whole "struct bpf_sock_ops_kern" regardless of what the bpf prog is needed before calling the bpf prog. The "u32 args[4]" is one of them. The is the older way of using bpf to extend kernel.

bpf has struct_ops support now which can pass only what is needed and without the need of doing the convert_ctx in filter.c. The "struct tcp_congestion_ops" can already be implemented in bpf. Take a look at selftests/bpf/progs/bpf_cubic.c. All the BPF_SOCK_OPS_*_CB (e.g. BPF_SOCK_OPS_TS_SCHED_OPT_CB here) could just a "ops" in the struct_ops.

That said, I think the first thing needs to figure out is how to enable bpf time stamping without having side effect on the user space. Continue the sockops approach first and use it to create a selftest bpf prog example. Then we can decide.






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux