On Mon, Oct 1, 2012 at 8:42 PM, Tommi Virtanen <tv@xxxxxxxxxxx> wrote: > On Sun, Sep 30, 2012 at 2:55 PM, Andrey Korolyov <andrey@xxxxxxx> wrote: >> Short post mortem - EX3200/12.1R2.9 may begin to drop packets (seems >> to appear more likely on 0.51 traffic patterns, which is very strange >> for L2 switching) when a bunch of the 802.3ad pairs, sixteen in my >> case, exposed to extremely high load - database benchmark over 700+ >> rbd-backed VMs and cluster rebalance at same time. It explains >> post-reboot lockups in igb driver and all types of lockups above. I >> would very appreciate any suggestions of switch models which do not >> expose such behavior in simultaneous conditions both off-list and in >> this thread. > > I don't see how a switch dropping packets would give an ethernet card > driver any excuse to crash, but I'm simultaneously happy to hear that > it doesn't seem like Ceph is at fault, and sorry for your troubles. > > I don't have an up to date 1GbE card recommendation to share, but I > would recommend making sure you're using a recent Linux kernel. I have incorrectly formulated a reason - of course drops can not cause a lockup by themselves, but switch may create somehow a long-lasting `corrupt` state on the trunk ports which leads to such lockups at the ethernet card. Of course I`ll play with the driver versions and card|port settings, thanks for suggestion :) I`m still investigating the issue since it is a quite hard to repeat in the right time and hope I`m able to capture this state using tcpdump-like, e.g. s/w methods - if card driver locks on something, it may prevent to process problematic byte sequence at packet sniffer level. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html