understanding networking code (2.4.18)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm trying to learn the networking code of an ancient 2.4.18 vanilla
kernel. Nevertheless when I guess to understand the theory I've got
problems when I going to put it in practice.
Here I go with the theory.

- The egress path of a packet in few words:

1. Coming from upper layers the egress path of a packet begins with
the function dev_queue_xmit(skb). This function is used by higher
protocols to send a packet in the form of a socket buffer, over a
network device.

The socket buffer is placed in the output queue of the network device.
This is done by use of the meth2. The next step is to invoke the
method qdisc_run(dev). This method only call qdisc_restart(dev) until
there are not more packets in the output queue or until the network
device does not accept any more packets:

while (!netif_queue_stopped(dev) && qdisc_restart(dev)<0)
               /* NOTHING */;

There is one special case: the device has not defined methods for
queue management (dev->enqueue == NULL), in this case the packet goes
directly to dev->hard_start_xmit() function, a device dependent
trasmission function. This case concerns logical network devices, such
as loopback device.

3. In the no special case: qdisc_restart(dev) is the responsible for
getting the next packet from the queue of network device, using a
special strategy (qdisc) and sending it by means of hard_start_xmit().

At this point before reach the function hard_start_xmit(), it is
possible to run into problems:

A. The network device is currently unable to send packets, whether
netif_queue_stopped(dev) test is true.

B. The lock dev->xmit_lock is set. This spinlock is normally set when
the transmission of a packet is to be started in qdisc_restart(). So
another CPU is owner of the lock because it is sending another packet.

In both cases the socket buffer is placed back into the queue and
finally, the NET_TX_SOFTRIQ is raised in the method netif_schedule(),
with cpu_raise_softirq(cpu, NET_TX_SOFTIRQ).

4. When do_softirq() is invoked, the function net_tx_action()
asociated with NET_TX_SOFIRQ is called. net_tx_action() is the
responsible for invoking again qdisc_run().

In short:

- Functions path for normal transmission process:
dev_queue_xmit(skb)->qdisc_run(dev)->qdisc_restart(dev)->hard_start_xmit()

- Functions path for sofirq transmission process:
dev_queue_xmit(skb)-> qdisc_run(dev)-> qdisc_restart(dev)->
netif_shedule(dev)->cpu_raise_softirq()->net_tx_action()->qdisc_run(dev)

OK, that is the theory. In order to test the theory I have coded
several points of control (counters) with the following patch:


-------------- BEGIN PATCH ------------------
diff -uprN -X dontdiff linux-2.4.18/include/linux/netdevice.h
linux-2.4.18_softirq/include/linux/netdevice.h
--- linux-2.4.18/include/linux/netdevice.h      2006-11-21
10:26:05.000000000 +0100
+++ linux-2.4.18_softirq/include/linux/netdevice.h      2007-01-03
15:01:23.000000000 +0100
@@ -480,7 +480,16 @@ struct softnet_data
       struct sk_buff          *completion_queue;
} __attribute__((__aligned__(SMP_CACHE_BYTES)));

+extern struct softirq_counters {
+        unsigned int raise_from_netif_cont;
+        unsigned int raise_from_kfree_skb_cont;
+        unsigned int hard_start_xmit_cont;
+        unsigned int xmit_lock_grabbed_cont;
+        unsigned int tx_completion_cont;
+        unsigned int tx_output_cont;
+};

+extern struct softirq_counters softirq_stats;
extern struct softnet_data softnet_data[NR_CPUS];

#define HAVE_NETIF_QUEUE
@@ -490,7 +499,10 @@ static inline void __netif_schedule(stru
       if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
               unsigned long flags;
               int cpu = smp_processor_id();
-
+
+#ifdef CONFIG_PROC_FS
+               softirq_stats.raise_from_netif_cont++;
+#endif
               local_irq_save(flags);
               dev->next_sched = softnet_data[cpu].output_queue;
               softnet_data[cpu].output_queue = dev;
@@ -540,6 +552,9 @@ static inline void dev_kfree_skb_irq(str
               int cpu =smp_processor_id();
               unsigned long flags;

+#ifdef CONFIG_PROC_FS
+               softirq_stats.raise_from_kfree_skb_cont++;
+#endif
               local_irq_save(flags);
               skb->next = softnet_data[cpu].completion_queue;
               softnet_data[cpu].completion_queue = skb;
diff -uprN -X dontdiff linux-2.4.18/net/core/dev.c
linux-2.4.18_softirq/net/core/dev.c
--- linux-2.4.18/net/core/dev.c 2002-02-25 20:38:14.000000000 +0100
+++ linux-2.4.18_softirq/net/core/dev.c 2007-01-03 13:04:40.000000000 +0100
@@ -107,6 +107,7 @@
extern int plip_init(void);
#endif

+struct softirq_counters softirq_stats;

/* This define, if set, will randomly drop a packet when congestion
 * is more than moderate.  It helps fairness in the multi-interface
@@ -1329,6 +1330,9 @@ static void net_tx_action(struct softirq
       if (softnet_data[cpu].completion_queue) {
               struct sk_buff *clist;

+#ifdef CONFIG_PROC_FS
+               softirq_stats.tx_completion_cont++;
+#endif
               local_irq_disable();
               clist = softnet_data[cpu].completion_queue;
               softnet_data[cpu].completion_queue = NULL;
@@ -1346,6 +1350,9 @@ static void net_tx_action(struct softirq
       if (softnet_data[cpu].output_queue) {
               struct net_device *head;

+#ifdef CONFIG_PROC_FS
+               softirq_stats.tx_output_cont++;
+#endif
               local_irq_disable();
               head = softnet_data[cpu].output_queue;
               softnet_data[cpu].output_queue = NULL;
@@ -1793,6 +1800,29 @@ static int dev_proc_stats(char *buffer,
       return len;
}

+static int softirq_proc_stats(char *buffer, char **start, off_t offset,
+               int length, int *eof, void *data)
+{
+       int len = 0;
+
+       len += sprintf(buffer+len, "raise_from_netif_cont: %d\n"
+                       "raise_from_kfree_skb_cont: %d\n"
+                       "(qdisc_restart) hard_start_xmit_cont: %d\n"
+                       "(qdisc_restart) xmit_lock_grabbed_cont: %d\n"
+                       "tx_completion_cont: %d\n"
+                       "tx_output_cont: %d\n",
+                       softirq_stats.raise_from_netif_cont,
+                       softirq_stats.raise_from_kfree_skb_cont,
+                       softirq_stats.hard_start_xmit_cont,
+                       softirq_stats.xmit_lock_grabbed_cont,
+                       softirq_stats.tx_completion_cont,
+                       softirq_stats.tx_output_cont);
+
+       *eof = 1;
+
+       return len;
+}
+
#endif /* CONFIG_PROC_FS */

@@ -2855,6 +2885,7 @@ int __init net_dev_init(void)
#ifdef CONFIG_PROC_FS
       proc_net_create("dev", 0, dev_get_info);
       create_proc_read_entry("net/softnet_stat", 0, 0, dev_proc_stats, NULL);
+       create_proc_read_entry("net/softirq_stat", 0, 0,
softirq_proc_stats, NULL);
#ifdef WIRELESS_EXT
       proc_net_create("wireless", 0, dev_get_wireless_info);
#endif /* WIRELESS_EXT */
diff -uprN -X dontdiff linux-2.4.18/net/sched/sch_generic.c
linux-2.4.18_softirq/net/sched/sch_generic.c
--- linux-2.4.18/net/sched/sch_generic.c        2000-08-18
19:26:25.000000000 +0200
+++ linux-2.4.18_softirq/net/sched/sch_generic.c        2007-01-03
12:59:53.000000000 +0100
@@ -97,6 +97,9 @@ int qdisc_restart(struct net_device *dev
                                       spin_unlock(&dev->xmit_lock);

                                       spin_lock(&dev->queue_lock);
+#ifdef CONFIG_PROC_FS
+                                       softirq_stats.hard_start_xmit_cont++;
+#endif
                                       return -1;
                               }
                       }
@@ -114,6 +117,9 @@ int qdisc_restart(struct net_device *dev
                          it by checking xmit owner and drop the
                          packet when deadloop is detected.
                        */
+#ifdef CONFIG_PROC_FS
+                       softirq_stats.xmit_lock_grabbed_cont++;
+#endif
                       if (dev->xmit_lock_owner == smp_processor_id()) {
                               kfree_skb(skb);
                               if (net_ratelimit())
-------------- END PATCH ------------------

When I send a lot of UDP packets with very high ratio I obtain this values:

# cat /proc/net/softirq_stat

raise_from_netif_cont: 0
raise_from_kfree_skb_cont: 0
(qdisc_restart) hard_start_xmit_cont: 4338
(qdisc_restart) xmit_lock_grabbed_cont: 0
tx_completion_cont: 1718
tx_output_cont: 160

Here go my problem: tx_completion_cont and tx_output_cont are counters
within net_tx_action. This function is called (ready to execution)
only when is invoked cpu_raise_sofirq. The only two points where this
funcition is invoked are:

1. In include/linux/netdevice.h ->  __netif_schedule
2. In include/linux/netdevice.h -> dev_kfree_skb_irq

So the counters: raise_from_netif_cont and raise_from_kfree_skb_cont
never must be zero!!!!

Anybody can tell me what's the problem?
What am I doing wrong?

I'm sorry for my english :-(

--
Javi Roman.

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux