On Wed, 2014-04-16 at 16:53 -0700, Harshavardhana wrote: > There are no tx/rx errors but > > dropped_link_overflow: 10046509 > dropped_link_error_or_filtered: 72353 > > This is of some concern, but wouldn't be sure what really happened. > Are you using myricom 10gig interfaces? Yes. Will have a read and check the switch settings etc. Thanks for the help. > > https://www.myricom.com/software/myri10ge/397-could-you-explain-the-meanings-of-the-myri10ge-counters-reported-in-the-output-of-ethtool.html > > ================= > dropped_link_overflow > > The number of received packets dropped due to lack of receive > (on-chip) buffer space. This will happen if: > > our driver/firmware is not consuming fast enough, and > the flow-control is off, or > the flow-control is on, so we are sending pause frames, but the other > side does not obey them. > > Verify that ethernet flow control is enabled on the 10GbE switch to > which the adapter is connected. > > If the application's traffic is bursty, have you tried the load-time > option myri10ge_big_rxring=1? Please read: Would you explain the > Myri10GE load-time option myri10ge_big_rxring? > ================= > > ================= > dropped_link_error_or_filtered > > The number of received packets that are not received into the receive > buffer because they are malformed, they are PAUSE frames used for > Ethernet flow control, they are not destined for the adapter (i.e. the > packet's destination MAC address does not match the adapter's MAC > address), or their destination MAC addresses are of the form > 01:80:C2:00:00:0X (reserved addresses). > > If this counter keeps increasing when there is no traffic, then the > increase is likely due to BPDU. If it only increases during a stress > test (achieving close to line rate), then the increase is likely due > to PAUSE. The counter also includes malformed frames due to CRC or > whatever. Also refer to How do I check for badcrcs when running the > Myri10GE software? for further details. > ================= > > May be contacting your Myricom vendors would be a right start? > > On Wed, Apr 16, 2014 at 4:15 PM, Franco Broi <franco.broi@xxxxxxxxxx> wrote: > > What should I be looking for? See below. > > > > I thought that maybe it coincided with a bunch of machines waking from > > sleep, but I don't think that is the case. > > > > [root@nas1 ~]# ethtool -S eth2 > > NIC statistics: > > rx_packets: 116095907410 > > tx_packets: 83692116889 > > rx_bytes: 141224428783450 > > tx_bytes: 1007756860391628 > > rx_errors: 0 > > tx_errors: 0 > > rx_dropped: 0 > > tx_dropped: 0 > > multicast: 0 > > collisions: 0 > > rx_length_errors: 0 > > rx_over_errors: 0 > > rx_crc_errors: 0 > > rx_frame_errors: 0 > > rx_fifo_errors: 0 > > rx_missed_errors: 0 > > tx_aborted_errors: 0 > > tx_carrier_errors: 0 > > tx_fifo_errors: 0 > > tx_heartbeat_errors: 0 > > tx_window_errors: 0 > > tx_boundary: 4096 > > WC: 1 > > irq: 134 > > MSI: 1 > > MSIX: 0 > > read_dma_bw_MBs: 1735 > > write_dma_bw_MBs: 1715 > > read_write_dma_bw_MBs: 3421 > > serial_number: 446488 > > watchdog_resets: 0 > > dca_capable_firmware: 1 > > dca_device_present: 1 > > link_changes: 2 > > link_up: 1 > > dropped_link_overflow: 10046509 > > dropped_link_error_or_filtered: 72353 > > dropped_pause: 0 > > dropped_bad_phy: 0 > > dropped_bad_crc32: 0 > > dropped_unicast_filtered: 72353 > > dropped_multicast_filtered: 24551326 > > dropped_runt: 0 > > dropped_overrun: 0 > > dropped_no_small_buffer: 0 > > dropped_no_big_buffer: 0 > > ----------- slice ---------: 0 > > tx_pkt_start: 2087737864 > > tx_pkt_done: 2087737864 > > tx_req: 2508370636 > > tx_done: 2508370636 > > rx_small_cnt: 1504058385 > > rx_big_cnt: 2957794484 > > wake_queue: 462814 > > stop_queue: 462814 > > tx_linearized: 1011916 > > > > > > On Wed, 2014-04-16 at 11:38 -0700, Harshavardhana wrote: > >> Perhaps a driver bug? - have you verified ethtool -S output? > >> > >> On Wed, Apr 16, 2014 at 2:42 AM, Franco Broi <franco.broi@xxxxxxxxxx> wrote: > >> > > >> > I've increased my tcp_max_syn_backlog to 4096 in the hope it will > >> > prevent it from happening again but I'm not sure what caused it in the > >> > first place. > >> > > >> > On Wed, 2014-04-16 at 17:25 +0800, Franco Broi wrote: > >> >> Anyone seen this problem? > >> >> > >> >> server > >> >> > >> >> Apr 16 14:34:28 nas1 kernel: [7506182.154332] TCP: TCP: Possible SYN flooding on port 49156. Sending cookies. Check SNMP counters. > >> >> Apr 16 14:34:31 nas1 kernel: [7506185.142589] TCP: TCP: Possible SYN flooding on port 49157. Sending cookies. Check SNMP counters. > >> >> Apr 16 14:34:53 nas1 kernel: [7506207.126193] TCP: TCP: Possible SYN flooding on port 49159. Sending cookies. Check SNMP counters. > >> >> > >> >> client > >> >> > >> >> Apr 16 14:34:21 charlie5 GlusterFS[6718]: [2014-04-16 06:34:21.710137] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-4: server 192.168.35.107:49157 has not responded in the last 42 seconds, disconnecting. > >> >> Apr 16 14:34:31 charlie5 GlusterFS[6718]: [2014-04-16 06:34:31.711605] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-2: server 192.168.35.107:49156 has not responded in the last 42 seconds, disconnecting. > >> >> Apr 16 14:35:13 charlie5 GlusterFS[6718]: [2014-04-16 06:35:13.758227] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-data-client-0: server 192.168.35.107:49159 has not responded in the last 42 seconds, disconnecting. > >> >> > >> >> > >> >> _______________________________________________ > >> >> Gluster-users mailing list > >> >> Gluster-users@xxxxxxxxxxx > >> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > >> > > >> > > >> > _______________________________________________ > >> > Gluster-users mailing list > >> > Gluster-users@xxxxxxxxxxx > >> > http://supercolony.gluster.org/mailman/listinfo/gluster-users > >> > >> > >> > > > > > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users