Mugunthan and all, We have essentially proven our theory, but have yet to solve the problem. Our theory was that the driver was essentially taking a few second nap during transmission, despite there being plenty more work to do. We further theorized that our custom userspace application was somehow preventing the driver from taking this nap, and that's why our RX performance is perfect all the time while our TX is so spotty unless that application is running. Over the last few days, we've went through several test patches and have essentially tracked the issue back to our emac_poll() function not being called. By inserting the following patch snippet, and snippets like it, into various functions in the emac driver, we were able to prove that from time to time these functions sleep for several seconds, despite iperf certainly having given them something to do. (Note: the patch has changed a bit since the output pasted below was captured, as we were looking for more detail, but the core functionality is the same.) =================================================================================================== [[ ---- patch snippet ---- ]] =================================================================================================== --- linux/drivers/net/ethernet/ti/davinci_emac.c 2013-04-08 16:20:19.202550042 -0400 +++ linux_emac_tst/drivers/net/ethernet/ti/davinci_emac.c 2013-04-08 18:45:37.000000000 -0400 @@ -64,6 +64,12 @@ #include "davinci_cpdma.h" +//// DEBUGGING ONLY!!!! +#include <linux/jiffies.h> +static unsigned long qstopped_jifs = 0; +static unsigned long poll_func_jifs = 0; +static u32 last_num_rx_pkts = 0, last_num_tx_pkts = 0; + static int debug_level; module_param(debug_level, int, 0); MODULE_PARM_DESC(debug_level, "DaVinci EMAC debug level (NETIF_MSG bits)"); @@ -1055,7 +1061,19 @@ * queue is stopped then start the queue as we have free desc for tx */ if (unlikely(netif_queue_stopped(ndev))) + { netif_wake_queue(ndev); + if (qstopped_jifs > 0) + { + // we've been stopped, measure how long + if ( time_after_eq(jiffies, (qstopped_jifs + HZ/2)) ) + { + // we've been stopped >= 1/2 second!!! + printk(KERN_ERR "===EMAC: TX queue has been stopped for > 1/2 second!!!\n"); + } + } + qstopped_jifs = 0; + } ndev->stats.tx_packets++; ndev->stats.tx_bytes += len; dev_kfree_skb_any(skb); @@ -1105,7 +1123,11 @@ * tell the kernel to stop sending us tx frames. */ if (unlikely(!cpdma_check_free_tx_desc(priv->txchan))) + { netif_stop_queue(ndev); + // we just stopped, record the time + qstopped_jifs = jiffies; + } return NETDEV_TX_OK; @@ -1374,6 +1396,15 @@ u32 status = 0; u32 num_tx_pkts = 0, num_rx_pkts = 0; + +unsigned long tmp = jiffies; +if ( time_after_eq(tmp, (poll_func_jifs + HZ/2)) ) +{ + // we've been stopped >= 1/2 second!!! + printk(KERN_ERR "===EMAC: POLL FUNCTION, last RX/TX=[%d][%d]\n", last_num_rx_pkts, last_num_tx_pkts); +} +poll_func_jifs = tmp; + /* Check interrupt vectors and call packet processing */ status = emac_read(EMAC_MACINVECTOR); @@ -1385,6 +1416,10 @@ if (status & mask) { num_tx_pkts = cpdma_chan_process(priv->txchan, EMAC_DEF_TX_MAX_SERVICE); + if (num_tx_pkts < 0) + { +printk(KERN_ERR "=== ERR!!! num_tx_pkts=%d\n", num_tx_pkts); + } } /* TX processing */ mask = EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC; @@ -1394,8 +1429,17 @@ if (status & mask) { num_rx_pkts = cpdma_chan_process(priv->rxchan, budget); + if (num_rx_pkts < 0) + { +printk(KERN_ERR "=== ERR!!! num_rx_pkts=%d\n", num_rx_pkts); + } } /* RX processing */ + +last_num_tx_pkts = num_tx_pkts; +last_num_rx_pkts = num_rx_pkts; + + mask = EMAC_DM644X_MAC_IN_VECTOR_HOST_INT; if (priv->version == EMAC_VERSION_2) mask = EMAC_DM646X_MAC_IN_VECTOR_HOST_INT; =================================================================================================== [[ ---- patch snippet ---- ]] =================================================================================================== It appears from our debugging that somehow the emac_poll() function is not being called despite there being work to do. As before however, when we start our custom application (which presently does very little from a system calls' perspective), our performance stabilizes greatly. Please see below. Has anyone else seen anything like this with the EMAC, or have any idea why or how this could be happening? We initially thought it could just be a failure to properly reschedule the emac_wait(), but after playing with various items there for a while, we're convinced that's not it. Now we're guessing that perhaps softirqs or something within the DMA world is to blame here, but we're certainly not experts in this area. Again, we've passed in 'nohlt' and have worked to remove power management. So we don't think that plays a role. Likewise, we never see a packet error -- even in our Wireshark traces. We just see the system take a split-second to multi-second break. What specifically causes the emac_poll() and emac_tx_handler() functions to fire, and is there a way to debug that cleanly? Can anyone imagine how a userspace application would be able to improve this behavior so much? Again, thank you so very much for your time. CF Boot log: =================================================================================================== Fresh boot with no network activity, we see this with our debugging in there. =================================================================================================== [ 22.815582] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 22.822937] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 23.817993] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 23.825378] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 24.817810] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 24.825225] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 30.793365] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 30.800811] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 31.427093] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 31.434661] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 32.426879] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 32.434387] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! root@test:~# iperf-i 5 -t 30000 -c 10.22.255.254 [ 43.093078] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 43.100463] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! =================================================================================================== Starting iperf with 4x instances of <Custom App> stuff running. See? It's very happy for a long time. =================================================================================================== ------------------------------------------------------------ Client connecting to 10.22.255.254, TCP port 5001 TCP window size: 19.6 KByte (default) ------------------------------------------------------------ [ 3] local 10.22.0.2 port 44102 connected with 10.22.255.254 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 5.0 sec 56.8 MBytes 95.3 Mbits/sec [ 3] 5.0-10.0 sec 56.0 MBytes 94.0 Mbits/sec [[ ---- SNIP ---- ]] [ 3] 615.0-620.0 sec 56.4 MBytes 94.6 Mbits/sec [ 3] 620.0-625.0 sec 55.3 MBytes 92.9 Mbits/sec =================================================================================================== Stopping all 4 <Custom App> instances from running... =================================================================================================== Stopping <Custom App>1... [ OK ] Stopped <Custom App>1. Stopping <Custom App>2... [ OK ] Stopped <Custom App>2. Stopping <Custom App>3... [ OK ] Stopped <Custom App>3. [ 3] 625.0-630.0 sec 55.1 MBytes 92.4 Mbits/sec Stopping <Custom App>4... [ OK ] Stopped <Custom App>4. =================================================================================================== Instantaneously, we see the POLL (and therefore the TX handler) warnings getting called and the performance is slipping and/or timing out all together. =================================================================================================== [ 681.469207] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 681.476501] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 3] 630.0-635.0 sec 8.59 MBytes 14.4 Mbits/sec [ 692.489929] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 692.497222] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 692.504272] ===EMAC: TX queue has been stopped for > 1/2 second!!! [ 3] 635.0-640.0 sec 11.4 MBytes 19.1 Mbits/sec [ 3] 640.0-645.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 645.0-650.0 sec 5.71 MBytes 9.58 Mbits/sec [ 3] 650.0-655.0 sec 29.6 MBytes 49.6 Mbits/sec [ 3] 655.0-660.0 sec 46.7 MBytes 78.3 Mbits/sec [ 714.523315] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 714.530609] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 3] 660.0-665.0 sec 42.3 MBytes 71.0 Mbits/sec [ 3] 665.0-670.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 670.0-675.0 sec 37.5 MBytes 62.9 Mbits/sec [ 3] 675.0-680.0 sec 53.1 MBytes 89.1 Mbits/sec [ 3] 680.0-685.0 sec 44.8 MBytes 75.2 Mbits/sec [ 3] 685.0-690.0 sec 49.2 MBytes 82.6 Mbits/sec [ 3] 690.0-695.0 sec 49.0 MBytes 82.2 Mbits/sec [ 747.616729] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 747.624023] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 3] 695.0-700.0 sec 40.2 MBytes 67.4 Mbits/sec [ 3] 700.0-705.0 sec 5.38 MBytes 9.03 Mbits/sec [ 3] 705.0-710.0 sec 51.8 MBytes 86.9 Mbits/sec Starting <Custom App>2... [ OK ] Started <Custom App>2. [ 3] 710.0-715.0 sec 51.9 MBytes 87.1 Mbits/sec [ 3] 715.0-720.0 sec 54.1 MBytes 90.8 Mbits/sec [ 3] 720.0-725.0 sec 53.6 MBytes 89.9 Mbits/sec [ 3] 725.0-730.0 sec 53.3 MBytes 89.4 Mbits/sec [ 3] 730.0-735.0 sec 53.2 MBytes 89.3 Mbits/sec [ 3] 735.0-740.0 sec 56.1 MBytes 94.1 Mbits/sec [ 3] 740.0-745.0 sec 52.0 MBytes 87.3 Mbits/sec [ 3] 745.0-750.0 sec 52.5 MBytes 88.1 Mbits/sec [ 3] 750.0-755.0 sec 55.9 MBytes 93.8 Mbits/sec [ 3] 755.0-760.0 sec 55.9 MBytes 93.9 Mbits/sec [ 3] 760.0-765.0 sec 53.3 MBytes 89.5 Mbits/sec ^C[ 3] 0.0-765.9 sec 7.93 GBytes 88.9 Mbits/sec =================================================================================================== Shut down iperf and drop back to the default idling =================================================================================================== root@test:~# [ 809.822906] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 809.960937] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 810.822845] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 810.960937] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! [ 813.143280] ===EMAC: POLL FUNCTION has not been called for > 1/2 second!!! [ 813.355712] ===EMAC: TX FUNCTION has not been called for > 1/2 second!!! -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html