There are no tx/rx errors but dropped_link_overflow: 10046509 dropped_link_error_or_filtered: 72353 This is of some concern, but wouldn't be sure what really happened. Are you using myricom 10gig interfaces? https://www.myricom.com/software/myri10ge/397-could-you-explain-the-meanings-of-the-myri10ge-counters-reported-in-the-output-of-ethtool.html ================= dropped_link_overflow The number of received packets dropped due to lack of receive (on-chip) buffer space. This will happen if: our driver/firmware is not consuming fast enough, and the flow-control is off, or the flow-control is on, so we are sending pause frames, but the other side does not obey them. Verify that ethernet flow control is enabled on the 10GbE switch to which the adapter is connected. If the application's traffic is bursty, have you tried the load-time option myri10ge_big_rxring=1? Please read: Would you explain the Myri10GE load-time option myri10ge_big_rxring? ================= ================= dropped_link_error_or_filtered The number of received packets that are not received into the receive buffer because they are malformed, they are PAUSE frames used for Ethernet flow control, they are not destined for the adapter (i.e. the packet's destination MAC address does not match the adapter's MAC address), or their destination MAC addresses are of the form 01:80:C2:00:00:0X (reserved addresses). If this counter keeps increasing when there is no traffic, then the increase is likely due to BPDU. If it only increases during a stress test (achieving close to line rate), then the increase is likely due to PAUSE. The counter also includes malformed frames due to CRC or whatever. Also refer to How do I check for badcrcs when running the Myri10GE software? for further details. ================= May be contacting your Myricom vendors would be a right start? On Wed, Apr 16, 2014 at 4:15 PM, Franco Broi <franco.broi@xxxxxxxxxx> wrote: > What should I be looking for? See below. > > I thought that maybe it coincided with a bunch of machines waking from > sleep, but I don't think that is the case. > > [root@nas1 ~]# ethtool -S eth2 > NIC statistics: > rx_packets: 116095907410 > tx_packets: 83692116889 > rx_bytes: 141224428783450 > tx_bytes: 1007756860391628 > rx_errors: 0 > tx_errors: 0 > rx_dropped: 0 > tx_dropped: 0 > multicast: 0 > collisions: 0 > rx_length_errors: 0 > rx_over_errors: 0 > rx_crc_errors: 0 > rx_frame_errors: 0 > rx_fifo_errors: 0 > rx_missed_errors: 0 > tx_aborted_errors: 0 > tx_carrier_errors: 0 > tx_fifo_errors: 0 > tx_heartbeat_errors: 0 > tx_window_errors: 0 > tx_boundary: 4096 > WC: 1 > irq: 134 > MSI: 1 > MSIX: 0 > read_dma_bw_MBs: 1735 > write_dma_bw_MBs: 1715 > read_write_dma_bw_MBs: 3421 > serial_number: 446488 > watchdog_resets: 0 > dca_capable_firmware: 1 > dca_device_present: 1 > link_changes: 2 > link_up: 1 > dropped_link_overflow: 10046509 > dropped_link_error_or_filtered: 72353 > dropped_pause: 0 > dropped_bad_phy: 0 > dropped_bad_crc32: 0 > dropped_unicast_filtered: 72353 > dropped_multicast_filtered: 24551326 > dropped_runt: 0 > dropped_overrun: 0 > dropped_no_small_buffer: 0 > dropped_no_big_buffer: 0 > ----------- slice ---------: 0 > tx_pkt_start: 2087737864 > tx_pkt_done: 2087737864 > tx_req: 2508370636 > tx_done: 2508370636 > rx_small_cnt: 1504058385 > rx_big_cnt: 2957794484 > wake_queue: 462814 > stop_queue: 462814 > tx_linearized: 1011916 > > > On Wed, 2014-04-16 at 11:38 -0700, Harshavardhana wrote: >> Perhaps a driver bug? - have you verified ethtool -S output? >> >> On Wed, Apr 16, 2014 at 2:42 AM, Franco Broi <franco.broi@xxxxxxxxxx> wrote: >> > >> > I've increased my tcp_max_syn_backlog to 4096 in the hope it will >> > prevent it from happening again but I'm not sure what caused it in the >> > first place. >> > >> > On Wed, 2014-04-16 at 17:25 +0800, Franco Broi wrote: >> >> Anyone seen this problem? >> >> >> >> server >> >> >> >> Apr 16 14:34:28 nas1 kernel: [7506182.154332] TCP: TCP: Possible SYN flooding on port 49156. Sending cookies. Check SNMP counters. >> >> Apr 16 14:34:31 nas1 kernel: [7506185.142589] TCP: TCP: Possible SYN flooding on port 49157. Sending cookies. Check SNMP counters. >> >> Apr 16 14:34:53 nas1 kernel: [7506207.126193] TCP: TCP: Possible SYN flooding on port 49159. Sending cookies. Check SNMP counters. >> >> >> >> client >> >> >> >> Apr 16 14:34:21 charlie5 GlusterFS[6718]: [2014-04-16 06:34:21.710137] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-4: server 192.168.35.107:49157 has not responded in the last 42 seconds, disconnecting. >> >> Apr 16 14:34:31 charlie5 GlusterFS[6718]: [2014-04-16 06:34:31.711605] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-2: server 192.168.35.107:49156 has not responded in the last 42 seconds, disconnecting. >> >> Apr 16 14:35:13 charlie5 GlusterFS[6718]: [2014-04-16 06:35:13.758227] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-0: server 192.168.35.107:49159 has not responded in the last 42 seconds, disconnecting. >> >> >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx >> > http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> >> > > -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users