Re: [PATCH v2 net-next] selftests: forwarding: add Per-Stream Filtering and Policing test for Ocelot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 26, 2022 at 06:40:22AM +0000, Ferenc Fejes wrote:
> Hi Vladimir!
> 
> On Thu, 2022-05-26 at 00:50 +0000, Vladimir Oltean wrote:
> > On Fri, May 06, 2022 at 12:24:37PM +0000, Ferenc Fejes wrote:
> > > Hi Vladimir!
> > > 
> > > On 2022. 05. 06. 14:01, Vladimir Oltean wrote:
> > > > Hi Ferenc,
> > > > 
> > > > On Fri, May 06, 2022 at 07:49:40AM +0000, Ferenc Fejes wrote:
> > > > This is correct. I have been testing only with the offloaded tc-
> > > > gate
> > > > action so I did not notice that the software does not act upon
> > > > the ipv.
> > > > Your proposal sounds straightforward enough. Care to send a bug
> > > > fix patch?
> > > 
> > > Unfortunately I cant, our company policy does not allow direct 
> > > open-source contributions :-(
> > > 
> > > However I would be more than happy if you can fix it.
> > 
> > That's too bad.
> > 
> > I have a patch which I am still testing, but you've managed to
> > captivate
> > my attention for saying that you are testing 802.1Qch with a software
> > implementation of tc-gate.
> > 
> > Do you have a use case for this? What cycle times are you targeting?
> > How are you ensuring that you are deterministically meeting the
> > deadlines?
> 
> The cycle times I targeted were nowhere near to a realistic TSN
> scenario:
> I "generated" ping packets in every 100 msecs and on the ingress port
> and I marked them with prio 1 for 500ms (gate 1) and prio 2 for another
> 500ms (gate 2). On the egress port I applied taprio with the same base-
> time and same 500-500ms cycles but reverse ordered gates (that's the
> "definition" of the Qch), so while act_gate on the ingress is in gate 1
> cycle, the taprio kept open gate 2 and gate 1 closed, etc.
> For "verification" I simply run a tcpdump on the listener machine what
> I pinged from the talker and eyeballed wether the 5-5 packets bursts
> shows up arrival timestamps.
> 
> > Do you also have a software tc-taprio on the egress device or is that
> > at least offloaded?
> 
> No, I experimented with the software version, but that worked with my
> netns tests and on physical machines too (after the IPV patch).
> 
> > 
> > I'm asking these questions because the peristaltic shaper is
> > primarily
> > intended to be used on hardware switches. The patch I'm preparing
> > includes changes to struct sk_buff. I just want to know how well I'll
> > be
> > able to sell these changes to maintainers strongly opposing the
> > growth
> > of this structure for an exceedingly small niche :)
> 
> Can you tell me about the intention behind the sk_buff changes? Does
> that required because of some offloading scenario? In my case putting
> the IPV into the skb->priority was good enough because the taprio using
> that field by default to select the traffic class for the packet.
> 
> Thanks,
> Ferenc
>

Hopefully this patch explains it:

-----------------------------[ cut here ]-----------------------------
>From c8d33e0d65a4a63884a626dc04c7190a2bbe122b Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@xxxxxxx>
Date: Wed, 25 May 2022 16:27:52 +0300
Subject: [PATCH] net/sched: act_gate: act on Internal Priority Value

IEEE 802.1Q specifies in Annex T ("Cyclic queuing and forwarding") the
use case for an Internal Priority Value set by the PSFP function.
Essentially, the intended use is to be able to select a packet's egress
queue squarely based on its arrival time. Essentially, this means that a
packet with the same VLAN PCP 5 can be delivered to TC 7 or 6 depending
on whether it was received in an "odd" or "even" cycle. To quote the
spec, "The use of the IPV allows this direction of frames to outbound
queues to be independent of the received priority, and also does not
affect the priority associated with the frame on transmission."

The problem is that the software implementation of act_gate takes the
IPV as entry from user space, but does not act on it in its software
implementation - only offloading drivers do.

To fix this, we need to keep a current_ipv variable according to the
gate entry that's currently executed by act_gate, and use this IPV to
overwrite the skb->priority.

In fact, a complication arises due to the following clause from 802.1Q:

| 8.6.6.1 PSFP queuing
| If PSFP is supported (8.6.5.1), and the IPV associated with the stream
| filter that passed the frame is anything other than the null value, then
| that IPV is used to determine the traffic class of the frame, in place
| of the frame's priority, via the Traffic Class Table specified in 8.6.6.
| In all other respects, the queuing actions specified in 8.6.6 are
| unchanged. The IPV is used only to determine the traffic class
| associated with a frame, and hence select an outbound queue; for all
| other purposes, the received priority is used.

An example of layer that we don't want to see the IPV is the egress-qos-map
of a potential VLAN output device. Instead, the VLAN PCP should still be
populated based on the original skb->priority.

I'm aware of the existence of a skb_update_prio() function in
__dev_queue_xmit(), right before passing the skb to the qdisc layer,
for the use case where net_prio cgroups are used to assign processes to
network priorities on a per-interface basis. But rewriting the
skb->priority with the skb->ipv there simply wouldn't work, exactly
because there might be a VLAN in the output path of the skb.

One might suggest: just delay the IPV rewriting in the presence of
stacked devices until it is actually needed and likely to be used,
like when dev->num_tc != 0 (which VLAN doesn't have). But there are
still other uses of skb->priority in the qdisc layer, like skbprio,
htb etc. From the spec's wording it isn't clear that we want these
functions to look at the proper packet user priority or at the PSFP IPV.
Probably the former.

So the only implementation that conforms to these requirements
seems to be the invasive one, where we introduce a dedicated
helper for the pattern where drivers and the core ask for
netdev_get_prio_tc_map(dev, skb->priority). We replace all such
instances with a helper that selects the TC based on IPV, if the skb
was filtered by PSFP time gates on ingress, and falls back to
skb->priority otherwise.

Fixes: a51c328df310 ("net: qos: introduce a gate control flow action")
Reported-by: Ferenc Fejes <ferenc.fejes@xxxxxxxxxxxx>
Link: https://lore.kernel.org/netdev/87o80h1n65.fsf@kurt/T/#meaec9c24b3add9cb11edd411278a3a6ecf01573e
Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 include/linux/netdevice.h                     | 13 +++++++++++++
 include/linux/skbuff.h                        | 11 +++++++++++
 include/net/tc_act/tc_gate.h                  |  1 +
 net/core/dev.c                                |  4 ++--
 net/dsa/tag_ocelot.c                          |  2 +-
 net/sched/act_gate.c                          |  6 ++++++
 net/sched/sch_taprio.c                        | 12 +++---------
 8 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 77c2e70b0860..719259af9aaa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8531,7 +8531,7 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 	int txq;
 
 	if (sb_dev) {
-		u8 tc = netdev_get_prio_tc_map(dev, skb->priority);
+		u8 tc = skb_get_prio_tc_map(skb, dev);
 		struct net_device *vdev = sb_dev;
 
 		txq = vdev->tc_to_txq[tc].offset;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 47b59f99b037..8b87d017288f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2366,6 +2366,19 @@ int netdev_set_prio_tc_map(struct net_device *dev, u8 prio, u8 tc)
 	return 0;
 }
 
+static inline int skb_get_prio_tc_map(const struct sk_buff *skb,
+				      const struct net_device *dev)
+{
+	__u32 prio = skb->priority;
+
+#if IS_ENABLED(CONFIG_NET_ACT_GATE)
+	if (skb->use_ipv)
+		prio = skb->ipv;
+#endif
+
+	return netdev_get_prio_tc_map(dev, prio);
+}
+
 int netdev_txq_to_tc(struct net_device *dev, unsigned int txq);
 void netdev_reset_tc(struct net_device *dev);
 int netdev_set_tc_queue(struct net_device *dev, u8 tc, u16 count, u16 offset);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da96f0d3e753..b0a463c0bc65 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -913,6 +913,9 @@ typedef unsigned char *sk_buff_data_t;
  *	@csum_start: Offset from skb->head where checksumming should start
  *	@csum_offset: Offset from csum_start where checksum should be stored
  *	@priority: Packet queueing priority
+ *	@use_ipv: Packet internal priority was altered by PSFP
+ *	@ipv: Internal Priority Value, overrides priority for traffic class
+ *		selection
  *	@ignore_df: allow local fragmentation
  *	@cloned: Head may be cloned (check refcnt to be sure)
  *	@ip_summed: Driver fed us an IP checksum
@@ -1145,6 +1148,10 @@ struct sk_buff {
 	__u8			slow_gro:1;
 	__u8			csum_not_inet:1;
 
+#ifdef CONFIG_NET_ACT_GATE
+	__u8			use_ipv:1;
+#endif
+
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
 #endif
@@ -1209,6 +1216,10 @@ struct sk_buff {
 	/* only useable after checking ->active_extensions != 0 */
 	struct skb_ext		*extensions;
 #endif
+
+#ifdef CONFIG_NET_ACT_GATE
+	__u32			ipv;
+#endif
 };
 
 /* if you move pkt_type around you also must adapt those constants */
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index c8fa11ebb397..b05c2c7d78e4 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -44,6 +44,7 @@ struct tcf_gate {
 	ktime_t			current_close_time;
 	u32			current_entry_octets;
 	s32			current_max_octets;
+	s32			current_ipv;
 	struct tcfg_gate_entry	*next_entry;
 	struct hrtimer		hitimer;
 	enum tk_offsets		tk_offset;
diff --git a/net/core/dev.c b/net/core/dev.c
index 721ba9c26554..956aa227c260 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3194,7 +3194,7 @@ static u16 skb_tx_hash(const struct net_device *dev,
 	u16 qcount = dev->real_num_tx_queues;
 
 	if (dev->num_tc) {
-		u8 tc = netdev_get_prio_tc_map(dev, skb->priority);
+		u8 tc = skb_get_prio_tc_map(skb, dev);
 
 		qoffset = sb_dev->tc_to_txq[tc].offset;
 		qcount = sb_dev->tc_to_txq[tc].count;
@@ -4002,7 +4002,7 @@ EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
 static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
 			       struct xps_dev_maps *dev_maps, unsigned int tci)
 {
-	int tc = netdev_get_prio_tc_map(dev, skb->priority);
+	int tc = skb_get_prio_tc_map(skb, dev);
 	struct xps_map *map;
 	int queue_index = -1;
 
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index 0d81f172b7a6..036e746f18eb 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -52,7 +52,7 @@ static void ocelot_xmit_common(struct sk_buff *skb, struct net_device *netdev,
 	ocelot_xmit_get_vlan_info(skb, dp, &vlan_tci, &tag_type);
 
 	qos_class = netdev_get_num_tc(netdev) ?
-		    netdev_get_prio_tc_map(netdev, skb->priority) : skb->priority;
+		    skb_get_prio_tc_map(skb, netdev) : skb->priority;
 
 	injection = skb_push(skb, OCELOT_TAG_LEN);
 	prefix = skb_push(skb, OCELOT_SHORT_PREFIX_LEN);
diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
index fd5155274733..9fb248b104f8 100644
--- a/net/sched/act_gate.c
+++ b/net/sched/act_gate.c
@@ -81,6 +81,7 @@ static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
 	gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
 	gact->current_entry_octets = 0;
 	gact->current_max_octets = next->maxoctets;
+	gact->current_ipv = next->ipv;
 
 	gact->current_close_time = ktime_add_ns(gact->current_close_time,
 						next->interval);
@@ -140,6 +141,11 @@ static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
 		}
 	}
 
+	if (gact->current_ipv >= 0) {
+		skb->use_ipv = true;
+		skb->ipv = gact->current_ipv;
+	}
+
 	spin_unlock(&gact->tcf_lock);
 
 	return gact->tcf_action;
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b9c71a304d39..fb8bc17e38bb 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -201,7 +201,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
 	s32 cycle_elapsed;
 	int tc, n;
 
-	tc = netdev_get_prio_tc_map(dev, skb->priority);
+	tc = skb_get_prio_tc_map(skb, dev);
 	packet_transmit_time = length_to_duration(q, qdisc_pkt_len(skb));
 
 	*interval_start = 0;
@@ -509,7 +509,6 @@ static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct Qdisc *child = q->qdiscs[i];
-		int prio;
 		u8 tc;
 
 		if (unlikely(!child))
@@ -522,9 +521,7 @@ static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
 		if (TXTIME_ASSIST_IS_ENABLED(q->flags))
 			return skb;
 
-		prio = skb->priority;
-		tc = netdev_get_prio_tc_map(dev, prio);
-
+		tc = skb_get_prio_tc_map(skb, dev);
 		if (!(gate_mask & BIT(tc)))
 			continue;
 
@@ -579,7 +576,6 @@ static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch)
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct Qdisc *child = q->qdiscs[i];
 		ktime_t guard;
-		int prio;
 		int len;
 		u8 tc;
 
@@ -597,9 +593,7 @@ static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch)
 		if (!skb)
 			continue;
 
-		prio = skb->priority;
-		tc = netdev_get_prio_tc_map(dev, prio);
-
+		tc = skb_get_prio_tc_map(skb, dev);
 		if (!(gate_mask & BIT(tc))) {
 			skb = NULL;
 			continue;
-----------------------------[ cut here ]-----------------------------



[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux