On Tuesday, March 26, 2024 10:29 AM, Gomes, Vinicius <vinicius.gomes@xxxxxxxxx> wrote: >Song Yoong Siang <yoong.siang.song@xxxxxxxxx> writes: > >> This patch adds support to per-packet Tx hardware timestamp request to >> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that >> user needs to enable Tx HW timestamp capability via igc_ioctl() with >> SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request. >> >> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0 >> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have >> four sets of timestamping registers. *skb and *xsk_tx_buffer pointers >> are used to indicate whether the timestamping register is already occupied. >> >> Furthermore, a boolean variable named xsk_pending_ts is used to hold the >> transmit completion until the tx hardware timestamp is ready. This is >> because, for i225/i226, the timestamp notification event comes some time >> after the transmit completion event. The driver will retrigger hardware irq >> to clean the packet after retrieve the tx hardware timestamp. >> >> Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook >> to the metadata location of the transmit packet. When the Tx timestamp >> interrupt is fired, the interrupt handler will copy the value of Tx hwts >> into metadata location via xsk_tx_metadata_complete(). >> >> This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata >> on Intel ADL-S platform. Below are the test steps and results. >> >> Test Step 1: Run xdp_hw_metadata app >> ./xdp_hw_metadata <iface> > /dev/shm/result.log >> >> Test Step 2: Enable Tx hardware timestamp >> hwstamp_ctl -i <iface> -t 1 -r 1 >> >> Test Step 3: Run ptp4l and phc2sys for time synchronization >> >> Test Step 4: Generate UDP packets with 1ms interval for 10s >> trafgen --dev <iface> '{eth(da=<addr>), udp(dp=9091)}' -t 1ms -n 10000 >> >> Test Step 5: Rerun Step 1-3 with 10s iperf3 as background traffic >> >> Test Step 6: Rerun Step 1-4 with 10s iperf3 as background traffic >> >> Based on iperf3 results below, the impact of holding tx completion to >> throughput is not observable. >> >> Result of last UDP packet (no. 10000) in Step 4: >> poll: 1 (0) skip=99 fail=0 redir=10000 >> xsk_ring_cons__peek: 1 >> 0x5640a37972d0: rx_desc[9999]->addr=f2110 addr=f2110 comp_addr=f2110 EoP >> rx_hash: 0x2049BE1D with RSS type:0x1 >> HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to User RX-time >sec:0.0000 (14.990 usec) >> XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User RX-time >sec:0.0000 (4.271 usec) >> No rx_vlan_tci or rx_vlan_proto, err=-95 >> 0x5640a37972d0: ping-pong with csum=ab19 (want 315b) csum_start=34 >csum_offset=6 >> 0x5640a37972d0: complete tx idx=9999 addr=f010 >> HW TX-complete-time: 1679819246793036971 (sec:1679819246.7930) delta to >User TX-complete-time sec:0.0001 (77.656 usec) >> XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User TX- >complete-time sec:0.0001 (132.640 usec) >> HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to HW TX- >complete-time sec:0.0001 (65.703 usec) >> 0x5640a37972d0: complete rx idx=10127 addr=f2110 >> >> Result of iperf3 without tx hwts request in step 5: >> [ ID] Interval Transfer Bitrate Retr >> [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender >> [ 5] 0.00-10.05 sec 2.74 GBytes 2.34 Gbits/sec receiver >> >> Result of iperf3 running parallel with trafgen command in step 6: >> [ ID] Interval Transfer Bitrate Retr >> [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender >> [ 5] 0.00-10.04 sec 2.74 GBytes 2.34 Gbits/sec receiver >> >> Co-developed-by: Lai Peter Jun Ann <jun.ann.lai@xxxxxxxxx> >> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@xxxxxxxxx> >> Signed-off-by: Song Yoong Siang <yoong.siang.song@xxxxxxxxx> >> Acked-by: John Fastabend <john.fastabend@xxxxxxxxx> >> --- >> V1: >https://patchwork.kernel.org/project/netdevbpf/patch/20231215162158.951925-1- >yoong.siang.song@xxxxxxxxx/ >> V2: >https://patchwork.kernel.org/project/netdevbpf/cover/20240301162348.898619-1- >yoong.siang.song@xxxxxxxxx/ >> V3: >https://patchwork.kernel.org/project/netdevbpf/cover/20240303083225.1184165-1- >yoong.siang.song@xxxxxxxxx/ >> >> changelog: >> V1 -> V2 >> - In struct igc_tx_timestamp_request, keep a pointer to igc_tx_buffer, >> instead of pointing xsk_pending_ts (Vinicius). >> - In struct igc_tx_timestamp_request, introduce buffer_type to indicate >> whether skb or igc_tx_buffer pointer should be use (Vinicius). >> - In struct igc_metadata_request, remove igc_adapter pointer (Vinicius). >> - When request tx hwts, copy the value of cmd_type, instead of using >> pointer (Vinicius). >> - For boolean variable, use true and false, instead of 1 and 0 (Vinicius). >> - In igc_xsk_request_timestamp(), make an early return if none of the 4 ts >> registers is available (Vinicius). >> - Create helper functions to clear tx buffer and skb for tstamp (John). >> - Perform throughput test with mix traffic (Vinicius & John). >> V2 -> V3 >> - Improve tstamp reg searching loop for better readability (John). >> - In igc_ptp_free_tx_buffer(), add comment to inform user that >> tstamp->xsk_tx_buffer and tstamp->skb are in union (John). >> V3 -> V4 >> - Add protection with xp_tx_metadata_enabled (Kurt & Maciej). >> --- >> --- >> drivers/net/ethernet/intel/igc/igc.h | 71 ++++++++------ >> drivers/net/ethernet/intel/igc/igc_main.c | 113 ++++++++++++++++++++-- >> drivers/net/ethernet/intel/igc/igc_ptp.c | 51 ++++++++-- >> 3 files changed, 195 insertions(+), 40 deletions(-) >> > >[...] > >> diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c >b/drivers/net/ethernet/intel/igc/igc_ptp.c >> index 885faaa7b9de..1bb026232efc 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_ptp.c >> +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c >> @@ -11,6 +11,7 @@ >> #include <linux/ktime.h> >> #include <linux/delay.h> >> #include <linux/iopoll.h> >> +#include <net/xdp_sock_drv.h> >> >> #define INCVALUE_MASK 0x7fffffff >> #define ISGN 0x80000000 >> @@ -545,6 +546,30 @@ static void igc_ptp_enable_rx_timestamp(struct >igc_adapter *adapter) >> wr32(IGC_TSYNCRXCTL, val); >> } >> >> +static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter, >> + struct igc_tx_timestamp_request *tstamp) >> +{ >> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) { >> + /* Release the transmit completion */ >> + tstamp->xsk_tx_buffer->xsk_pending_ts = false; >> + >> + /* Note: tstamp->skb and tstamp->xsk_tx_buffer are in union. >> + * By setting tstamp->xsk_tx_buffer to NULL, tstamp->skb will >> + * become NULL as well. >> + */ >> + tstamp->xsk_tx_buffer = NULL; >> + tstamp->buffer_type = 0; >> + >> + /* Trigger txrx interrupt for transmit completion */ >> + igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0); >> + >> + return; >> + } >> + >> + dev_kfree_skb_any(tstamp->skb); >> + tstamp->skb = NULL; >> +} >> + >> static void igc_ptp_clear_tx_tstamp(struct igc_adapter *adapter) >> { >> unsigned long flags; >> @@ -555,8 +580,8 @@ static void igc_ptp_clear_tx_tstamp(struct igc_adapter >*adapter) >> for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) { >> struct igc_tx_timestamp_request *tstamp = &adapter->tx_tstamp[i]; >> >> - dev_kfree_skb_any(tstamp->skb); >> - tstamp->skb = NULL; >> + if (tstamp->skb) >> + igc_ptp_free_tx_buffer(adapter, tstamp); >> } >> > >More a question: you are potentially triggering an interrupt from >igc_ptp_clear_tx_tstamp() (igc_xsk_wakeup()) which can be called from >igc_down(). So, how does it work when there's a pending timestamp and >you remove the igc module? (example of a situation that it might be >problematic). Hi Vinicius, Thanks for reviewing the patch. There is test_bit(__IGC_DOWN, &adapter->state) checking in igc_sxk_wakeup(). Since igc_down() will set __IGC_DOWN before call into igc_ptp_suspend(), so I believe the checking in igc_sxk_wakeup() should be enough to prevent the situation that you mentioned. > > >Cheers, >-- >Vinicius Thanks & Regards Siang