On 3/2/22 8:55 PM, Martin KaFai Lau wrote:
[...]
When tapping at ingress, it currently expects the skb->tstamp is either 0
or the (rcv) timestamp. Meaning, the tapping at ingress path
has already expected the skb->tstamp could be 0 and it will get
the (rcv) timestamp by ktime_get_real() when needed.
There are two cases for tapping at ingress:
One case is af_packet queues the skb to its sk_receive_queue.
The skb is either not shared or new clone created. The newly
added skb_clear_delivery_time() is called to clear the
delivery_time (if any) and set the (rcv) timestamp if
needed before the skb is queued to the sk_receive_queue.
[...]
+DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
+
+/* It is used in the ingress path to clear the delivery_time.
+ * If needed, set the skb->tstamp to the (rcv) timestamp.
+ */
+static inline void skb_clear_delivery_time(struct sk_buff *skb)
+{
+ if (skb->mono_delivery_time) {
+ skb->mono_delivery_time = 0;
+ if (static_branch_unlikely(&netstamp_needed_key))
+ skb->tstamp = ktime_get_real();
+ else
+ skb->tstamp = 0;
+ }
+}
+
static inline void skb_clear_tstamp(struct sk_buff *skb)
[...]
@@ -2199,6 +2199,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
spin_lock(&sk->sk_receive_queue.lock);
po->stats.stats1.tp_packets++;
sock_skb_set_dropcount(sk, skb);
+ skb_clear_delivery_time(skb);
Maybe not fully clear from your description, but for ingress taps, we are allowed
to mangle timestamp here because main recv loop enters taps via deliver_skb(), which
bumps skb->users refcount and {t,}packet_rcv() always hits the skb_shared(skb) case
which then clones skb.. (and for egress we are covered anyway given dev_queue_xmit_nit()
will skb_clone() once anyway for tx tstamp)?
__skb_queue_tail(&sk->sk_receive_queue, skb);
spin_unlock(&sk->sk_receive_queue.lock);
sk->sk_data_ready(sk);
@@ -2377,6 +2378,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
po->stats.stats1.tp_packets++;
if (copy_skb) {
status |= TP_STATUS_COPY;
+ skb_clear_delivery_time(copy_skb);
__skb_queue_tail(&sk->sk_receive_queue, copy_skb);
}
spin_unlock(&sk->sk_receive_queue.lock);
Thanks,
Daniel