Hi, today I'd like to share a severe problem we've found (and fixed) on our Ceph cluster. We're running 48 OSDs (8 per host). While restarting all OSDs on a host, the kernel's nf_conntrack table was overflown. This rendered all OSDs on that machine unusable. The symptoms were as follows. In the kernel log, we saw lines like: | Aug 6 15:23:48 cartman06 kernel: [12713575.554784] nf_conntrack: table full, dropping packet This is effectively a DoS against the kernel's IP stack. In the OSD log files, we saw repeated connection attempts like: | 2014-08-06 15:22:35.348175 7f92f25a8700 10 -- 172.22.4.42:6802/9560 >> 172.22.4.51:0/2025662 pipe(0x7f9208035440 sd=382 :6802 s=2 pgs=26750 cs=1 l=1 c=0x7f92080021c0).fault on lossy channel, failing | 2014-08-06 15:22:35.348287 7f8fd69e4700 10 -- 172.22.4.42:6802/9560 >> 172.22.4.39:0/3024957 pipe(0x7f9208007b30 sd=149 :6802 s=2 pgs=245725 cs=1 l=1 c=0x7f9208036630).fault on lossy channel, failing | 2014-08-06 15:22:35.348293 7f8fe24e4700 20 -- 172.22.4.42:6802/9560 >> 172.22.4.38:0/1013265 pipe(0x7f92080476e0 sd=450 :6802 s=4 pgs=32439 cs=1 l=1 c=0x7f9208018e90).writer finishing | 2014-08-06 15:22:35.348284 7f8fd4fca700 2 -- 172.22.4.42:6802/9560 >> 172.22.4.5:0/3032136 pipe(0x7f92080686b0 sd=305 :6802 s=2 pgs=306100 cs=1 l=1 c=0x7f920805f340).fault 0: Success | 2014-08-06 15:22:35.348292 7f8fd108b700 20 -- 172.22.4.42:6802/9560 >> 172.22.4.4:0/1000901 pipe(0x7f920802e7d0 sd=401 :6802 s=4 pgs=73173 cs=1 l=1 c=0x7f920802eda0).writer finishing | 2014-08-06 15:22:35.344719 7f8fd1d98700 2 -- 172.22.4.42:6802/9560 >> 172.22.4.49:0/3026524 pipe(0x7f9208033a80 sd=492 :6802 s=2 pgs=12845 cs=1 l=1 c=0x7f9208033ce0).reader couldn't read tag, Success and so on, generating 1000s of log lines. The OSDs were spinning with 100% CPU, trying to re-connect in rapid succession. The repeated connection attempts stopped nf_conntrack from getting out of its overflown state. Thus, we saw blocked requests for 15 minutes or so, until the MONs banned the stuck OSDs from the cluster. As a short term countermeasure, we stopped all OSDs on the affected hosts and started them one by one, leaving enough time in between to allow the recovery settle a bit (10 sec gap between OSDs was enough). During normal operation, we see only 5000-6000 connections on a host. As a permanent fix, we have doubled the size of the nf_conntrack table and reduced some timeouts according to <http://www.pc-freak.net/blog/resolving-nf_conntrack-table-full-dropping-packet-flood-message-in-dmesg-linux-kernel-log/>. Now a restart of all 8 OSDs on a host works without problems. Alternatively, we have considered removing nf_conntrack completely. This, however, is not possible since we use host-based firewalling and nf_conntrack is wired quite deeply into Linux' firewall code. Just to share our experience in case someone experiences the same problem. Regards Christian -- Dipl.-Inf. Christian Kauhaus <>< ? kc at gocept.com ? systems administration gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany http://gocept.com ? tel +49 345 219401-11 Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations