On Thu, Mar 03, 2022 at 11:48:19AM +0100, Daniel Borkmann wrote: > On 3/2/22 8:55 PM, Martin KaFai Lau wrote: > [...] > > When tapping at ingress, it currently expects the skb->tstamp is either 0 > > or the (rcv) timestamp. Meaning, the tapping at ingress path > > has already expected the skb->tstamp could be 0 and it will get > > the (rcv) timestamp by ktime_get_real() when needed. > > > > There are two cases for tapping at ingress: > > > > One case is af_packet queues the skb to its sk_receive_queue. > > The skb is either not shared or new clone created. The newly > > added skb_clear_delivery_time() is called to clear the > > delivery_time (if any) and set the (rcv) timestamp if > > needed before the skb is queued to the sk_receive_queue. > [...] > > +DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); > > + > > +/* It is used in the ingress path to clear the delivery_time. > > + * If needed, set the skb->tstamp to the (rcv) timestamp. > > + */ > > +static inline void skb_clear_delivery_time(struct sk_buff *skb) > > +{ > > + if (skb->mono_delivery_time) { > > + skb->mono_delivery_time = 0; > > + if (static_branch_unlikely(&netstamp_needed_key)) > > + skb->tstamp = ktime_get_real(); > > + else > > + skb->tstamp = 0; > > + } > > +} > > + > > static inline void skb_clear_tstamp(struct sk_buff *skb) > [...] > > @@ -2199,6 +2199,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, > > spin_lock(&sk->sk_receive_queue.lock); > > po->stats.stats1.tp_packets++; > > sock_skb_set_dropcount(sk, skb); > > + skb_clear_delivery_time(skb); > > Maybe not fully clear from your description, but for ingress taps, we are allowed > to mangle timestamp here because main recv loop enters taps via deliver_skb(), which > bumps skb->users refcount and {t,}packet_rcv() always hits the skb_shared(skb) case > which then clones skb.. (and for egress we are covered anyway given dev_queue_xmit_nit() > will skb_clone() once anyway for tx tstamp)? Yes, refcount_inc(&skb->users) and then skb_clone() is my understanding also for the ingress tapping path. On top of that, in general, the current {t,}packet_rcv() is changing other fields of a skb also before queuing it, so it has to be a clone.