Re: [PATCH bpf-next v5 02/13] xsk: Add TX timestamp and TX checksum offload support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/13/23 15:10, Stanislav Fomichev wrote:
On Mon, Nov 13, 2023 at 5:16 AM Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote:


On 11/2/23 23:58, Stanislav Fomichev wrote:
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 2ecf79282c26..b0ee7ad19b51 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -106,6 +106,41 @@ struct xdp_options {
   #define XSK_UNALIGNED_BUF_ADDR_MASK \
       ((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)

+/* Request transmit timestamp. Upon completion, put it into tx_timestamp
+ * field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_TIMESTAMP             (1 << 0)
+
+/* Request transmit checksum offload. Checksum start position and offset
+ * are communicated via csum_start and csum_offset fields of struct
+ * xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_CHECKSUM                      (1 << 1)
+
+/* AF_XDP offloads request. 'request' union member is consumed by the driver
+ * when the packet is being transmitted. 'completion' union member is
+ * filled by the driver when the transmit completion arrives.
+ */
+struct xsk_tx_metadata {
+     union {
+             struct {
+                     __u32 flags;
+
+                     /* XDP_TXMD_FLAGS_CHECKSUM */
+
+                     /* Offset from desc->addr where checksumming should start. */
+                     __u16 csum_start;
+                     /* Offset from csum_start where checksum should be stored. */
+                     __u16 csum_offset;
+             } request;
+
+             struct {
+                     /* XDP_TXMD_FLAGS_TIMESTAMP */
+                     __u64 tx_timestamp;
+             } completion;
+     };
+};

This looks wrong to me. It looks like member @flags is not avail at
completion time.  At completion time, I assume we also want to know if
someone requested to get the timestamp for this packet (else we could
read garbage).

I've moved the parts that are preserved across tx and tx completion
into xsk_tx_metadata_compl.
This is to address Magnus/Maciej feedback where userspace might race
with the kernel.
See: https://lore.kernel.org/bpf/ZNoJenzKXW5QSR3E@boxer/


Does this mean that every driver have to extend their TX-desc ring with
sizeof(struct xsk_tx_metadata_compl)?
Won't this affect the performance of this V5?

$ pahole -C xsk_tx_metadata_compl ./drivers/net/ethernet/stmicro/stmmac/stmmac.ko
 struct xsk_tx_metadata_compl {
	__u64 *              tx_timestamp;         /*     0     8 */

	/* size: 8, cachelines: 1, members: 1 */
	/* last cacheline: 8 bytes */
 };

Guess, I must be misunderstanding, as I was expecting to see the @flags
member being preserved across, as I get the race there.

Looking at stmmac driver, it does look like this xsk_tx_metadata_compl
is part of the TX-ring for completion (tx_skbuff_dma) and the
tx_timestamp data is getting stored here.  How is userspace AF_XDP
application getting access to the tx_timestamp data?
I though this was suppose to get stored in metadata data area (umem)?

Also looking at the code, the kernel would not have a "crash" race on
the flags member (if we preserve in struct), because the code checks the
driver HW-TS config-state + TX-descriptor for the availability of a
HW-TS in the descriptor.


Another thing (I've raised this before): It would be really practical to
store an u64 opaque value at TX and then read it at Completion time.
One use-case is a forwarding application storing HW RX-time and
comparing this to TX completion time to deduce the time spend processing
the packet.

This can be another member, right? But note that extending
xsk_tx_metadata_compl might be a bit complicated because drivers have
to carry this info somewhere. So we have to balance the amount of
passed data between the tx and the completion.

I don't think my opaque value proposal is subject to same race problem.
I think this can be stores in metadata area and across tx and tx
completion, because any race on a flags change is the userspace
programmers problem, as it cannot cause any kernel crash (given kernel
have no need to read this).

--Jesper






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux