> -----Original Message----- > From: Vladimir Oltean <olteanv@xxxxxxxxx> > Sent: 2021年4月18日 17:19 > To: Y.b. Lu <yangbo.lu@xxxxxxx> > Cc: netdev@xxxxxxxxxxxxxxx; Richard Cochran <richardcochran@xxxxxxxxx>; > Vladimir Oltean <vladimir.oltean@xxxxxxx>; David S . Miller > <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Jonathan Corbet > <corbet@xxxxxxx>; Kurt Kanzenbach <kurt@xxxxxxxxxxxxx>; Andrew Lunn > <andrew@xxxxxxx>; Vivien Didelot <vivien.didelot@xxxxxxxxx>; Florian > Fainelli <f.fainelli@xxxxxxxxx>; Claudiu Manoil <claudiu.manoil@xxxxxxx>; > Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx>; > UNGLinuxDriver@xxxxxxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [net-next 1/3] net: dsa: optimize tx timestamp request handling > > On Fri, Apr 16, 2021 at 08:36:53PM +0800, Yangbo Lu wrote: > > Optimization could be done on dsa_skb_tx_timestamp(), and dsa device > > drivers should adapt to it. > > > > - Check SKBTX_HW_TSTAMP request flag at the very beginning, instead of in > > port_txtstamp, so that most skbs not requiring tx timestamp just return. > > Agree that this is a trivial performance optimization with no downside that we > should be making. > > > - No longer to identify PTP packets, and limit tx timestamping only for PTP > > packets. If device driver likes, let device driver do. > > Agree that DSA has a way too heavy hand in imposing upon the driver which > packets should be timestampable and which ones shouldn't. > > For example, I have a latency measurement tool called isochron which is based > on hardware timestamping of non-PTP packets (in order to not disturb the PTP > state machines): > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. > com%2Fvladimiroltean%2Ftsn-scripts&data=04%7C01%7Cyangbo.lu%40 > nxp.com%7C3772f63deb6f4491933208d9024af750%7C686ea1d3bc2b4c6fa9 > 2cd99c5c301635%7C0%7C0%7C637543343267914018%7CUnknown%7CTW > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV > CI6Mn0%3D%7C1000&sdata=deOn03j6biPEUwppNkv%2F9if1u5HIdLEtzz > AkWW0p1rc%3D&reserved=0 > > I can't use it on DSA interfaces, for rather artificial reasons. > > > - It is a waste to clone skb directly in dsa_skb_tx_timestamp(). > > For one-step timestamping, a clone is not needed. For any failure of > > port_txtstamp (this may usually happen), the skb clone has to be freed. > > So put skb cloning into port_txtstamp where it really needs. > > Mostly agree. For two-step timestamping, it is an operation which all drivers > need to do, so it is in the common potion. If we want to support one-step, we > need to avoid cloning the PTP packets. > Thanks a lot for your ACK. > > Signed-off-by: Yangbo Lu <yangbo.lu@xxxxxxx> > > --- > > Documentation/networking/timestamping.rst | 7 +++++-- > > .../net/dsa/hirschmann/hellcreek_hwtstamp.c | 20 ++++++++++++------ > > .../net/dsa/hirschmann/hellcreek_hwtstamp.h | 2 +- > > drivers/net/dsa/mv88e6xxx/hwtstamp.c | 21 > +++++++++++++------ > > drivers/net/dsa/mv88e6xxx/hwtstamp.h | 6 +++--- > > drivers/net/dsa/ocelot/felix.c | 11 ++++++---- > > drivers/net/dsa/sja1105/sja1105_ptp.c | 6 +++++- > > drivers/net/dsa/sja1105/sja1105_ptp.h | 2 +- > > include/net/dsa.h | 2 +- > > net/dsa/slave.c | 20 +++++------------- > > 10 files changed, 57 insertions(+), 40 deletions(-) > > > > diff --git a/Documentation/networking/timestamping.rst > > b/Documentation/networking/timestamping.rst > > index f682e88fa87e..7f04a699a5d1 100644 > > --- a/Documentation/networking/timestamping.rst > > +++ b/Documentation/networking/timestamping.rst > > @@ -635,8 +635,8 @@ in generic code: a BPF classifier > > (``ptp_classify_raw``) is used to identify PTP event messages (any > > other packets, including PTP general messages, are not timestamped), and > provides two hooks to drivers: > > > > -- ``.port_txtstamp()``: The driver is passed a clone of the > > timestampable skb > > - to be transmitted, before actually transmitting it. Typically, a > > switch will > > +- ``.port_txtstamp()``: A clone of the timestampable skb to be > > +transmitted > > + is needed, before actually transmitting it. Typically, a switch > > +will > > have a PTP TX timestamp register (or sometimes a FIFO) where the > timestamp > > becomes available. There may be an IRQ that is raised upon this > timestamp's > > availability, or the driver might have to poll after invoking @@ > > -645,6 +645,9 @@ timestamped), and provides two hooks to drivers: > > later use (when the timestamp becomes available). Each skb is annotated > with > > a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's > > job of keeping track of which clone belongs to which skb. > > + But one-step timestamping request is handled differently with above > > + two-step timestamping. The skb clone is no longer needed since > > + hardware will insert TX time information on packet during egress. > > Bonus points for updating the documentation, but I don't quite like the end > result. Please feel free to restructure more, in order to have a clearer and more > coherent explanation. > > Also, this paragraph from right above is no longer true: > > In code, DSA provides for most of the infrastructure for timestamping > already, > in generic code: a BPF classifier (``ptp_classify_raw``) is used to identify > PTP event messages (any other packets, including PTP general messages, > are not > timestamped), and provides two hooks to drivers: > > It's nothing like that anymore. It's more of a passthrough now with your > changes, the BPF classifier is not run by the DSA core but optionally by > individual taggers. > > Here is my attempt of rewriting this documentation paragraph, feel free to > take which parts you consider relevant: > > -----------------------------[cut here]----------------------------- > > In the generic layer, DSA provides the following infrastructure for PTP > timestamping: > > - ``.port_txtstamp()``: a hook called prior to the transmission of > packets with a hardware TX timestamping request from user space. > This is required for two-step timestamping, since the hardware > timestamp becomes available after the actual MAC transmission, so the > driver must be prepared to correlate the timestamp with the original > packet so that it can re-enqueue the packet back into the socket's > error queue. To save the packet for when the timestamp becomes > available, the driver can call ``skb_clone_sk`` and save the resulting > clone in ``DSA_SKB_CB(skb)->clone``. Typically, a switch will have a > PTP TX timestamp register (or sometimes a FIFO) where the timestamp > becomes available. In case of a FIFO, the hardware might store > key-value pairs of PTP sequence ID/message type/domain number and the > actual timestamp. To perform the correlation correctly between the > packets in a queue waiting for timestamping and the actual timestamps, > drivers can use a BPF classifier (``ptp_classify_raw``) to identify > the PTP transport type, and ``ptp_parse_header`` to interpret the PTP > header fields. There may be an IRQ that is raised upon this > timestamp's availability, or the driver might have to poll after > invoking ``dev_queue_xmit()`` towards the host interface. > One-step TX timestamping do not require packet cloning, since there is > no follow-up message required by the PTP protocol (because the > TX timestamp is embedded into the packet by the MAC), and therefore > user space does not expect the packet annotated with the TX timestamp > to be re-enqueued into its socket's error queue. > > - ``.port_rxtstamp()``: On RX, the BPF classifier is run by DSA to > identify PTP event messages (any other packets, including PTP general > messages, are not timestamped). The original (and only) timestampable > skb is provided to the driver, for it to annotate it with a timestamp, > if that is immediately available, or defer to later. On reception, > timestamps might either be available in-band (through metadata in the > DSA header, or attached in other ways to the packet), or out-of-band > (through another RX timestamping FIFO). Deferral on RX is typically > necessary when retrieving the timestamp needs a sleepable context. In > that case, it is the responsibility of the DSA driver to call > ``netif_rx_ni()`` on the freshly timestamped skb. > > -----------------------------[cut here]----------------------------- > That's really considerate and accuracy description. I will update with this. (Will adjust if we can drop dsa_skb_cb in dsa core.) Thanks. > > > > - ``.port_rxtstamp()``: The original (and only) timestampable skb is provided > > to the driver, for it to annotate it with a timestamp, if that is > > immediately diff --git > > a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c > > b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c > > index 69dd9a2e8bb6..2ff4b7c08b72 100644 > > --- a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c > > +++ b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c > > @@ -374,31 +374,39 @@ long hellcreek_hwtstamp_work(struct > > ptp_clock_info *ptp) } > > > > bool hellcreek_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, unsigned int type) > > + struct sk_buff *skb, struct sk_buff **clone) > > { > > struct hellcreek *hellcreek = ds->priv; > > struct hellcreek_port_hwtstamp *ps; > > struct ptp_header *hdr; > > + unsigned int type; > > > > ps = &hellcreek->ports[port].port_hwtstamp; > > > > - /* Check if the driver is expected to do HW timestamping */ > > - if (!(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP)) > > + type = ptp_classify_raw(skb); > > + if (type == PTP_CLASS_NONE) > > return false; > > > > /* Make sure the message is a PTP message that needs to be > timestamped > > * and the interaction with the HW timestamping is enabled. If not, stop > > * here > > */ > > - hdr = hellcreek_should_tstamp(hellcreek, port, clone, type); > > + hdr = hellcreek_should_tstamp(hellcreek, port, skb, type); > > if (!hdr) > > return false; > > > > + *clone = skb_clone_sk(skb); > > + if (!(*clone)) > > + return false; > > + > > if (test_and_set_bit_lock(HELLCREEK_HWTSTAMP_TX_IN_PROGRESS, > > - &ps->state)) > > + &ps->state)) { > > + kfree_skb(*clone); > > + *clone = NULL; > > return false; > > + } > > > > - ps->tx_skb = clone; > > + ps->tx_skb = *clone; > > > > /* store the number of ticks occurred since system start-up till this > > * moment > > diff --git a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h > > b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h > > index c0745ffa1ebb..58cc96642076 100644 > > --- a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h > > +++ b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h > > @@ -45,7 +45,7 @@ int hellcreek_port_hwtstamp_get(struct dsa_switch > > *ds, int port, bool hellcreek_port_rxtstamp(struct dsa_switch *ds, int port, > > struct sk_buff *clone, unsigned int type); bool > > hellcreek_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, unsigned int type); > > + struct sk_buff *skb, struct sk_buff **clone); > > > > int hellcreek_get_ts_info(struct dsa_switch *ds, int port, > > struct ethtool_ts_info *info); > > diff --git a/drivers/net/dsa/mv88e6xxx/hwtstamp.c > > b/drivers/net/dsa/mv88e6xxx/hwtstamp.c > > index 094d17a1d037..280a95962861 100644 > > --- a/drivers/net/dsa/mv88e6xxx/hwtstamp.c > > +++ b/drivers/net/dsa/mv88e6xxx/hwtstamp.c > > @@ -469,24 +469,33 @@ long mv88e6xxx_hwtstamp_work(struct > > ptp_clock_info *ptp) } > > > > bool mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, unsigned int type) > > + struct sk_buff *skb, struct sk_buff **clone) > > { > > struct mv88e6xxx_chip *chip = ds->priv; > > struct mv88e6xxx_port_hwtstamp *ps = &chip->port_hwtstamp[port]; > > struct ptp_header *hdr; > > + unsigned int type; > > > > - if (!(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP)) > > - return false; > > + type = ptp_classify_raw(skb); > > + if (type == PTP_CLASS_NONE) > > + return false > > > > - hdr = mv88e6xxx_should_tstamp(chip, port, clone, type); > > + hdr = mv88e6xxx_should_tstamp(chip, port, skb, type); > > if (!hdr) > > return false; > > > > + *clone = skb_clone_sk(skb); > > + if (!(*clone)) > > + return false; > > + > > if (test_and_set_bit_lock(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS, > > - &ps->state)) > > + &ps->state)) { > > + kfree_skb(*clone); > > + *clone = NULL; > > return false; > > + } > > > > - ps->tx_skb = clone; > > + ps->tx_skb = *clone; > > ps->tx_tstamp_start = jiffies; > > ps->tx_seq_id = be16_to_cpu(hdr->sequence_id); > > > > diff --git a/drivers/net/dsa/mv88e6xxx/hwtstamp.h > > b/drivers/net/dsa/mv88e6xxx/hwtstamp.h > > index 9da9f197ba02..da2b253334d0 100644 > > --- a/drivers/net/dsa/mv88e6xxx/hwtstamp.h > > +++ b/drivers/net/dsa/mv88e6xxx/hwtstamp.h > > @@ -118,7 +118,7 @@ int mv88e6xxx_port_hwtstamp_get(struct > dsa_switch > > *ds, int port, bool mv88e6xxx_port_rxtstamp(struct dsa_switch *ds, int > port, > > struct sk_buff *clone, unsigned int type); bool > > mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, unsigned int type); > > + struct sk_buff *skb, struct sk_buff **clone); > > > > int mv88e6xxx_get_ts_info(struct dsa_switch *ds, int port, > > struct ethtool_ts_info *info); > > @@ -152,8 +152,8 @@ static inline bool mv88e6xxx_port_rxtstamp(struct > > dsa_switch *ds, int port, } > > > > static inline bool mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, > > - unsigned int type) > > + struct sk_buff *skb, > > + struct sk_buff **clone) > > { > > return false; > > } > > diff --git a/drivers/net/dsa/ocelot/felix.c > > b/drivers/net/dsa/ocelot/felix.c index 6b5442be0230..cdec2f5e271c > > 100644 > > --- a/drivers/net/dsa/ocelot/felix.c > > +++ b/drivers/net/dsa/ocelot/felix.c > > @@ -1396,14 +1396,17 @@ static bool felix_rxtstamp(struct dsa_switch > > *ds, int port, } > > > > static bool felix_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, unsigned int type) > > + struct sk_buff *skb, struct sk_buff **clone) > > { > > struct ocelot *ocelot = ds->priv; > > struct ocelot_port *ocelot_port = ocelot->ports[port]; > > > > - if (ocelot->ptp && (skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP) > && > > - ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP) { > > - ocelot_port_add_txtstamp_skb(ocelot, port, clone); > > + if (ocelot->ptp && ocelot_port->ptp_cmd == > IFH_REW_OP_TWO_STEP_PTP) { > > + *clone = skb_clone_sk(skb); > > + if (!(*clone)) > > + return false; > > + > > + ocelot_port_add_txtstamp_skb(ocelot, port, *clone); > > return true; > > } > > > > diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c > > b/drivers/net/dsa/sja1105/sja1105_ptp.c > > index 1b90570b257b..6a1f854a8c33 100644 > > --- a/drivers/net/dsa/sja1105/sja1105_ptp.c > > +++ b/drivers/net/dsa/sja1105/sja1105_ptp.c > > @@ -436,7 +436,7 @@ bool sja1105_port_rxtstamp(struct dsa_switch *ds, > int port, > > * callback, where we will timestamp it synchronously. > > */ > > bool sja1105_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *skb, unsigned int type) > > + struct sk_buff *skb, struct sk_buff **clone) > > { > > struct sja1105_private *priv = ds->priv; > > struct sja1105_port *sp = &priv->ports[port]; @@ -444,6 +444,10 @@ > > bool sja1105_port_txtstamp(struct dsa_switch *ds, int port, > > if (!sp->hwts_tx_en) > > return false; > > > > + *clone = skb_clone_sk(skb); > > + if (!(*clone)) > > + return false; > > + > > return true; > > } > > > > diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h > > b/drivers/net/dsa/sja1105/sja1105_ptp.h > > index 3daa33e98e77..ab80b73219cb 100644 > > --- a/drivers/net/dsa/sja1105/sja1105_ptp.h > > +++ b/drivers/net/dsa/sja1105/sja1105_ptp.h > > @@ -105,7 +105,7 @@ bool sja1105_port_rxtstamp(struct dsa_switch *ds, > int port, > > struct sk_buff *skb, unsigned int type); > > > > bool sja1105_port_txtstamp(struct dsa_switch *ds, int port, > > - struct sk_buff *skb, unsigned int type); > > + struct sk_buff *skb, struct sk_buff **clone); > > > > int sja1105_hwtstamp_get(struct dsa_switch *ds, int port, struct > > ifreq *ifr); > > > > diff --git a/include/net/dsa.h b/include/net/dsa.h index > > 1259b0f40684..c8415c324e27 100644 > > --- a/include/net/dsa.h > > +++ b/include/net/dsa.h > > @@ -734,7 +734,7 @@ struct dsa_switch_ops { > > int (*port_hwtstamp_set)(struct dsa_switch *ds, int port, > > struct ifreq *ifr); > > bool (*port_txtstamp)(struct dsa_switch *ds, int port, > > - struct sk_buff *clone, unsigned int type); > > + struct sk_buff *skb, struct sk_buff **clone); > > How about not passing "clone" back to DSA as an argument by reference, but > instead require the driver to populate DSA_SKB_CB(skb)->clone if it needs to do > so? > > Also, how about changing the return type to void? Returning true or false > makes no difference. > > > bool (*port_rxtstamp)(struct dsa_switch *ds, int port, > > struct sk_buff *skb, unsigned int type); > > > > diff --git a/net/dsa/slave.c b/net/dsa/slave.c index > > 9300cb66e500..5b746a903ef4 100644 > > --- a/net/dsa/slave.c > > +++ b/net/dsa/slave.c > > @@ -19,7 +19,6 @@ > > #include <linux/if_bridge.h> > > #include <linux/if_hsr.h> > > #include <linux/netpoll.h> > > -#include <linux/ptp_classify.h> > > > > #include "dsa_priv.h" > > > > @@ -555,26 +554,19 @@ static void dsa_skb_tx_timestamp(struct > dsa_slave_priv *p, > > struct sk_buff *skb) > > { > > struct dsa_switch *ds = p->dp->ds; > > - struct sk_buff *clone; > > - unsigned int type; > > + struct sk_buff *clone = NULL; > > > > - type = ptp_classify_raw(skb); > > - if (type == PTP_CLASS_NONE) > > + if (!(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) > > return; > > > > if (!ds->ops->port_txtstamp) > > return; > > > > - clone = skb_clone_sk(skb); > > - if (!clone) > > + if (!ds->ops->port_txtstamp(ds, p->dp->index, skb, &clone)) > > return; > > > > - if (ds->ops->port_txtstamp(ds, p->dp->index, clone, type)) { > > + if (clone) > > DSA_SKB_CB(skb)->clone = clone; > > - return; > > - } > > - > > - kfree_skb(clone); > > } > > > > netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device > > *dev) @@ -628,9 +620,7 @@ static netdev_tx_t dsa_slave_xmit(struct > > sk_buff *skb, struct net_device *dev) > > > > DSA_SKB_CB(skb)->clone = NULL; > > > > - /* Identify PTP protocol packets, clone them, and pass them to the > > - * switch driver > > - */ > > + /* Handle tx timestamp request if has */ > > s/if has/if any/ Sorry. Will fix. > > > dsa_skb_tx_timestamp(p, skb); > > > > if (dsa_realloc_skb(skb, dev)) { > > -- > > 2.25.1 > >