Hi Christian, This is good advice. Presumably we saw this issue before, since we have the following in our cluster?s puppet manifest: sysctl { "net.netfilter.nf_conntrack_max": val => "1024000", } sysctl { "net.nf_conntrack_max": val => "1024000", } But I don?t remember when or how we discovered this, and google isn?t helping. I suggest that this should be added to ceph.com docs. Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- On 08 Aug 2014, at 10:46, Christian Kauhaus <kc at gocept.com> wrote: > Hi, > > today I'd like to share a severe problem we've found (and fixed) on our Ceph > cluster. We're running 48 OSDs (8 per host). While restarting all OSDs on a > host, the kernel's nf_conntrack table was overflown. This rendered all OSDs on > that machine unusable. > > The symptoms were as follows. In the kernel log, we saw lines like: > > | Aug 6 15:23:48 cartman06 kernel: [12713575.554784] nf_conntrack: table > full, dropping packet > > This is effectively a DoS against the kernel's IP stack. > > In the OSD log files, we saw repeated connection attempts like: > > | 2014-08-06 15:22:35.348175 7f92f25a8700 10 -- 172.22.4.42:6802/9560 >> > 172.22.4.51:0/2025662 pipe(0x7f9208035440 sd=382 :6802 s=2 pgs=26750 cs=1 l=1 > c=0x7f92080021c0).fault on lossy channel, failing > | 2014-08-06 15:22:35.348287 7f8fd69e4700 10 -- 172.22.4.42:6802/9560 >> > 172.22.4.39:0/3024957 pipe(0x7f9208007b30 sd=149 :6802 s=2 pgs=245725 cs=1 l=1 > c=0x7f9208036630).fault on lossy channel, failing > | 2014-08-06 15:22:35.348293 7f8fe24e4700 20 -- 172.22.4.42:6802/9560 >> > 172.22.4.38:0/1013265 pipe(0x7f92080476e0 sd=450 :6802 s=4 pgs=32439 cs=1 l=1 > c=0x7f9208018e90).writer finishing > | 2014-08-06 15:22:35.348284 7f8fd4fca700 2 -- 172.22.4.42:6802/9560 >> > 172.22.4.5:0/3032136 pipe(0x7f92080686b0 sd=305 :6802 s=2 pgs=306100 cs=1 l=1 > c=0x7f920805f340).fault 0: Success > | 2014-08-06 15:22:35.348292 7f8fd108b700 20 -- 172.22.4.42:6802/9560 >> > 172.22.4.4:0/1000901 pipe(0x7f920802e7d0 sd=401 :6802 s=4 pgs=73173 cs=1 l=1 > c=0x7f920802eda0).writer finishing > | 2014-08-06 15:22:35.344719 7f8fd1d98700 2 -- 172.22.4.42:6802/9560 >> > 172.22.4.49:0/3026524 pipe(0x7f9208033a80 sd=492 :6802 s=2 pgs=12845 cs=1 l=1 > c=0x7f9208033ce0).reader couldn't read tag, Success > > and so on, generating 1000s of log lines. The OSDs were spinning with 100% > CPU, trying to re-connect in rapid succession. The repeated connection > attempts stopped nf_conntrack from getting out of its overflown state. > > Thus, we saw blocked requests for 15 minutes or so, until the MONs banned the > stuck OSDs from the cluster. > > As a short term countermeasure, we stopped all OSDs on the affected hosts and > started them one by one, leaving enough time in between to allow the recovery > settle a bit (10 sec gap between OSDs was enough). During normal operation, we > see only 5000-6000 connections on a host. > > As a permanent fix, we have doubled the size of the nf_conntrack table and > reduced some timeouts according to > <http://www.pc-freak.net/blog/resolving-nf_conntrack-table-full-dropping-packet-flood-message-in-dmesg-linux-kernel-log/>. > Now a restart of all 8 OSDs on a host works without problems. > > Alternatively, we have considered removing nf_conntrack completely. This, > however, is not possible since we use host-based firewalling and nf_conntrack > is wired quite deeply into Linux' firewall code. > > Just to share our experience in case someone experiences the same problem. > > Regards > > Christian > > -- > Dipl.-Inf. Christian Kauhaus <>< ? kc at gocept.com ? systems administration > gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany > http://gocept.com ? tel +49 345 219401-11 > Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com