Re: [net-next v10 1/2] net: sched: use queue_mapping to pick tx queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/17/22 9:20 AM, Paolo Abeni wrote:
On Tue, 2022-03-15 at 20:48 +0800, Tonghao Zhang wrote:
On Tue, Mar 15, 2022 at 5:59 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 3/14/22 3:15 PM, xiangxia.m.yue@xxxxxxxxx wrote:
[...]
   include/linux/netdevice.h |  3 +++
   include/linux/rtnetlink.h |  1 +
   net/core/dev.c            | 31 +++++++++++++++++++++++++++++--
   net/sched/act_skbedit.c   |  6 +++++-
   4 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0d994710b335..f33fb2d6712a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3065,6 +3065,9 @@ struct softnet_data {
       struct {
               u16 recursion;
               u8  more;
+#ifdef CONFIG_NET_EGRESS
+             u8  skip_txqueue;
+#endif
       } xmit;
   #ifdef CONFIG_RPS
       /* input_queue_head should be written by cpu owning this struct,
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 7f970b16da3a..ae2c6a3cec5d 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -100,6 +100,7 @@ void net_dec_ingress_queue(void);
   #ifdef CONFIG_NET_EGRESS
   void net_inc_egress_queue(void);
   void net_dec_egress_queue(void);
+void netdev_xmit_skip_txqueue(bool skip);
   #endif

   void rtnetlink_init(void);
diff --git a/net/core/dev.c b/net/core/dev.c
index 75bab5b0dbae..8e83b7099977 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3908,6 +3908,25 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)

       return skb;
   }
+
+static struct netdev_queue *
+netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
+{
+     int qm = skb_get_queue_mapping(skb);
+
+     return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
+}
+
+static bool netdev_xmit_txqueue_skipped(void)
+{
+     return __this_cpu_read(softnet_data.xmit.skip_txqueue);
+}
+
+void netdev_xmit_skip_txqueue(bool skip)
+{
+     __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
+}
+EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
   #endif /* CONFIG_NET_EGRESS */

   #ifdef CONFIG_XPS
@@ -4078,7 +4097,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
   static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
   {
       struct net_device *dev = skb->dev;
-     struct netdev_queue *txq;
+     struct netdev_queue *txq = NULL;
       struct Qdisc *q;
       int rc = -ENOMEM;
       bool again = false;
@@ -4106,11 +4125,17 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
                       if (!skb)
                               goto out;
               }
+
+             netdev_xmit_skip_txqueue(false);
+
               nf_skip_egress(skb, true);
               skb = sch_handle_egress(skb, &rc, dev);
               if (!skb)
                       goto out;
               nf_skip_egress(skb, false);
+
+             if (netdev_xmit_txqueue_skipped())
+                     txq = netdev_tx_queue_mapping(dev, skb);
       }
   #endif
       /* If device/qdisc don't need skb->dst, release it right now while
@@ -4121,7 +4146,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
       else
               skb_dst_force(skb);

-     txq = netdev_core_pick_tx(dev, skb, sb_dev);
+     if (likely(!txq))

nit: Drop likely(). If the feature is used from sch_handle_egress(), then this would always be the case.
Hi Daniel
I think in most case, we don't use skbedit queue_mapping in the
sch_handle_egress() , so I add likely in fast path.

Yeah, but then let branch predictor do its work ? We can still change and drop the
likely() once we add support for BPF though..

+             txq = netdev_core_pick_tx(dev, skb, sb_dev);
+
       q = rcu_dereference_bh(txq->qdisc);

How will the `netdev_xmit_skip_txqueue(true)` be usable from BPF side (see bpf_convert_ctx_access() ->
queue_mapping)?
Good questions, In other patch, I introduce the
bpf_netdev_skip_txqueue, so we can use netdev_xmit_skip_txqueue in bpf
side

Yeah, that bpf_netdev_skip_txqueue() won't fly. It's basically a helper doing quirk for
an implementation detail (aka calling netdev_xmit_skip_txqueue()). Was hoping you have
something better we could use along with the context rewrite of __sk_buff's queue_mapping,
but worst case we need to rework a bit for BPF. :/

Thanks,
Daniel



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux