On Thu, Oct 31, 2024 at 1:15 AM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > Jason Xing wrote: > > On Wed, Oct 30, 2024 at 1:37 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > > > > > On 10/29/24 8:04 PM, Jason Xing wrote: > > > >>>>>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > > > >>>>>>> const struct sk_buff *ack_skb, > > > >>>>>>> struct skb_shared_hwtstamps *hwtstamps, > > > >>>>>>> @@ -5549,6 +5575,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb, > > > >>>>>>> u32 tsflags; > > > >>>>>>> > > > >>>>>>> tsflags = READ_ONCE(sk->sk_tsflags); > > > >>>>>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype)) > > > >>>>>> > > > >>>>>> I still don't get this part since v2. How does it work with cmsg only > > > >>>>>> SOF_TIMESTAMPING_TX_*? > > > >>>>>> > > > >>>>>> I tried with "./txtimestamp -6 -c 1 -C -N -L ::1" and it does not return any tx > > > >>>>>> time stamp after this patch. > > > >>>>>> > > > >>>>>> I am likely missing something > > > >>>>>> or v2 concluded that this behavior change is acceptable? > > > >>>>> > > > >>>>> Sorry, I submitted this series accidentally removing one important > > > >>>>> thing which is similar to what Vadim Fedorenko mentioned in the v1 > > > >>>>> [1]: > > > >>>>> adding another member like sk_flags_bpf to handle the cmsg case. > > > >>>>> > > > >>>>> Willem, would it be acceptable to add another field in struct sock to > > > >>>>> help us recognise the case where BPF and cmsg works parallelly? > > > >>>>> > > > >>>>> [1]: https://lore.kernel.org/all/662873cb-a897-464e-bdb3-edf01363c3b2@xxxxxxxxx/ > > > >>>> > > > >>>> The current timestamp flags don't need a u32. Maybe just reserve a bit > > > >>>> for this purpose? > > > >>> > > > >>> Sure. Good suggestion. > > > >>> > > > >>> But I think only using one bit to reflect whether the sk->sk_tsflags > > > >>> is used by normal or cmsg features is not enough. The existing > > > >>> implementation in tcp_sendmsg_locked() doesn't override the > > > >>> sk->sk_tsflags even the normal and cmsg features enabled parallelly. > > > >>> It only overrides sockc.tsflags in tcp_sendmsg_locked(). Based on > > > >>> that, even if at some point users suddenly remove the cmsg use and > > > >>> then the prior normal SO_TIMESTAMPING continues to work. > > > >>> > > > >>> How about this, please see below: > > > >>> For now, sk->sk_tsflags only uses 17 bits (see the last one > > > >>> SOF_TIMESTAMPING_OPT_RX_FILTER). The cmsg feature only uses 4 flags > > > >>> (see SOF_TIMESTAMPING_TX_RECORD_MASK in __sock_cmsg_send()). With that > > > >>> said, we could reserve the highest four bits for cmsg use for the > > > >>> moment. Four bits represents four points where we can record the > > > >>> timestamp in the tx case. > > > >>> > > > >>> Do you agree on this point? > > > >> > > > >> I don't follow. > > > >> > > > >> I probably miss the entire point. > > > >> > > > >> The goal for sockcm fields is to start with the sk field and > > > >> optionally override based on cmsg. This is what sockcm_init does for > > > >> tsflags. > > > >> > > > >> This information is for the skb, so these are recording flags. > > > >> > > > >> Why does the new datapath need to know whether features are enabled > > > >> through setsockopt or on a per-call basis with a cmsg? > > > >> > > > >> The goal was always to keep the reporting flags per socket, but make > > > >> the recording flag per packet, mainly for sampling. > > > > > > > > If a user uses 1) cmsg feature, 2) bpf feature at the same time, we > > > > allow each feature to work independently. > > > > > > > > How could it work? It relies on sk_tstamp_tx_flags() function in the > > > > current patch: when we are in __skb_tstamp_tx(), we cannot know which > > > > flags in each feature are set without fetching sk->sk_tsflags and > > > > sk->sk_tsflags_bpf. Then we are able to know what timestamp we want to > > > > record. To put it in a simple way, we're not sure if the user wants to > > > > see a SCHED timestamp by using the cmsg feature in __skb_tstamp_tx() > > > > if we hit this test statement "skb_shinfo(skb)->tx_flags & > > > > SKBTX_SCHED_TSTAMP)". So we need those two socket tsflag fields to > > > > help us. > > > > > > I also don't see how a new bit/integer in a sk can help to tell the per cmsg > > > on/off. This cmsg may have tx timestamp on while the next cmsg can have it off. > > > > It's not hard to use it because we can clear every socket cmsg tsflags > > when we're done the check in tcp_sendmsg_locked() if the cmsg feature > > is not enabled. Then we can accurately know which timestamp should we > > print in the tx path. > > > > > > > > There is still one bit in skb_shinfo(skb)->tx_flags. How about define a > > > SKBTX_BPF for everything. imo, the fine control on > > > SOF_TIMESTAMPING_TX_{SCHED,SOFTWARE} is not useful for bpf. Almost all of the > > > time the bpf program wants all available time stamps (sched, software, and > > > hwtstamp if the NIC has it). > > I like the approach of just calling BPF on every hook. Assuming that > the call is very cheap, which AFAIK is true. > > In that case we don't need complex branching in C to optionally skip > this step, as we do for reporting to userspace. > > All the logic and complexity is in the BPF program itself. > > We obviously then let go of the goal to model the BPF API close to the > existing SO_TIMESTAMPING API. Though I advocated for keeping them > aligned, I also think we should just tailor it to what makes most > sense in the BPF space. > > > Sorry, I really doubt that we can lose the fine control. > > Since BPF is called at each reporting point, no control is lost, > actually. Sorry, I still don't get it :( If there is something wrong with my understanding, please correct me. BPF is only called on every sock_opt point in this case, like BPF_SOCK_OPS_TCP_CONNECT_CB, not every report point of SO_TIMESTAMPING. If we add check to test if skb is set SKBTX_BPF in __skb_tstamp_tx(), then at every point bpf will be called. But it's different from SO_TIMESTAMPING drived by each bit (SCHED/TX_SOFTWARE) to control each point. My question is if we would use SKBTX_BPF for everything, how could we control and know when we hit SCHED/TX_SOFTWARE/ACK time from the bpf programs' perspective? Only one bit... It will print everything without the ability to control. Then if we try the SKBTX_BPF approach, it seems we don't actually insist on adding a test statement in __skb_tstamp_tx(). Instead, we could add into more places (by only checking the SKBTX_BPF flag), say, tcp_write_xmit(), right? I'm not saying I'm opposed to this idea. Instead I think it's very useful, just a few questions haunting me... Thanks, Jason