nf_conntrack overflow crashes OSDs

daniel.vanderster@xxxxxxx (Dan Van Der Ster) · Fri, 8 Aug 2014 11:48:18 +0000

Hi Christian,
This is good advice. Presumably we saw this issue before, since we have the following in our cluster?s puppet manifest:

  sysctl { "net.netfilter.nf_conntrack_max": val => "1024000", }
  sysctl { "net.nf_conntrack_max": val => "1024000", }

But I don?t remember when or how we discovered this, and google isn?t helping. I suggest that this should be added to ceph.com docs.

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --

On 08 Aug 2014, at 10:46, Christian Kauhaus <kc at gocept.com> wrote:

> Hi,
> 
> today I'd like to share a severe problem we've found (and fixed) on our Ceph
> cluster. We're running 48 OSDs (8 per host). While restarting all OSDs on a
> host, the kernel's nf_conntrack table was overflown. This rendered all OSDs on
> that machine unusable.
> 
> The symptoms were as follows. In the kernel log, we saw lines like:
> 
> | Aug  6 15:23:48 cartman06 kernel: [12713575.554784] nf_conntrack: table
> full, dropping packet
> 
> This is effectively a DoS against the kernel's IP stack.
> 
> In the OSD log files, we saw repeated connection attempts like:
> 
> | 2014-08-06 15:22:35.348175 7f92f25a8700 10 -- 172.22.4.42:6802/9560 >>
> 172.22.4.51:0/2025662 pipe(0x7f9208035440 sd=382 :6802 s=2 pgs=26750 cs=1 l=1
> c=0x7f92080021c0).fault on lossy channel, failing
> | 2014-08-06 15:22:35.348287 7f8fd69e4700 10 -- 172.22.4.42:6802/9560 >>
> 172.22.4.39:0/3024957 pipe(0x7f9208007b30 sd=149 :6802 s=2 pgs=245725 cs=1 l=1
> c=0x7f9208036630).fault on lossy channel, failing
> | 2014-08-06 15:22:35.348293 7f8fe24e4700 20 -- 172.22.4.42:6802/9560 >>
> 172.22.4.38:0/1013265 pipe(0x7f92080476e0 sd=450 :6802 s=4 pgs=32439 cs=1 l=1
> c=0x7f9208018e90).writer finishing
> | 2014-08-06 15:22:35.348284 7f8fd4fca700  2 -- 172.22.4.42:6802/9560 >>
> 172.22.4.5:0/3032136 pipe(0x7f92080686b0 sd=305 :6802 s=2 pgs=306100 cs=1 l=1
> c=0x7f920805f340).fault 0: Success
> | 2014-08-06 15:22:35.348292 7f8fd108b700 20 -- 172.22.4.42:6802/9560 >>
> 172.22.4.4:0/1000901 pipe(0x7f920802e7d0 sd=401 :6802 s=4 pgs=73173 cs=1 l=1
> c=0x7f920802eda0).writer finishing
> | 2014-08-06 15:22:35.344719 7f8fd1d98700  2 -- 172.22.4.42:6802/9560 >>
> 172.22.4.49:0/3026524 pipe(0x7f9208033a80 sd=492 :6802 s=2 pgs=12845 cs=1 l=1
> c=0x7f9208033ce0).reader couldn't read tag, Success
> 
> and so on, generating 1000s of log lines. The OSDs were spinning with 100%
> CPU, trying to re-connect in rapid succession. The repeated connection
> attempts stopped nf_conntrack from getting out of its overflown state.
> 
> Thus, we saw blocked requests for 15 minutes or so, until the MONs banned the
> stuck OSDs from the cluster.
> 
> As a short term countermeasure, we stopped all OSDs on the affected hosts and
> started them one by one, leaving enough time in between to allow the recovery
> settle a bit (10 sec gap between OSDs was enough). During normal operation, we
> see only 5000-6000 connections on a host.
> 
> As a permanent fix, we have doubled the size of the nf_conntrack table and
> reduced some timeouts according to
> <http://www.pc-freak.net/blog/resolving-nf_conntrack-table-full-dropping-packet-flood-message-in-dmesg-linux-kernel-log/>.
> Now a restart of all 8 OSDs on a host works without problems.
> 
> Alternatively, we have considered removing nf_conntrack completely. This,
> however, is not possible since we use host-based firewalling and nf_conntrack
> is wired quite deeply into Linux' firewall code.
> 
> Just to share our experience in case someone experiences the same problem.
> 
> Regards
> 
> Christian
> 
> -- 
> Dipl.-Inf. Christian Kauhaus <>< ? kc at gocept.com ? systems administration
> gocept gmbh & co. kg ? Forsterstra?e 29 ? 06112 Halle (Saale) ? Germany
> http://gocept.com ? tel +49 345 219401-11
> Python, Pyramid, Plone, Zope ? consulting, development, hosting, operations
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com