Avoid Ubuntu Linux kernel 4.15.0-36

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As a little "heads-up":

If you are running Ubuntu Bionic 18.04, or Xenial 16.04 with "HWE"
kernels, and have systems running under 4.15.0-36 - which was the
default between 2018-10-01 and 2018-10-22 - please consider upgrading to
the latest 4.15.0-38 ASAP (or downgrade to 4.15.0-34).

4.15.0-36 has a TCP bug[1] that can occasionally slow down a TCP
connection to a trickle of 2.5 Kbytes/s (512-byte segments every 200ms).
Once a TCP connection is in this state, it will never get out.

This started happening within our Ceph clusters after we reinstalled a
few servers as part of our Bluestore migration.  The effect on our RBD
users (OpenStack VMs) was pretty terrible - the typical 4MB transaction
would take about 27 MINUTES at this rate, causing timeouts and crashes.

This was absolutely painful to diagnose, because it happened so rarely
and was hard to reproduce.  Fortunately the fix is easy - just don't run
this kernel.

I should note that our Ceph clusters run over IPv6; I'm not sure whether
the TCP bug can hit with IPv4 (the bug was reported for IPv6 as well),
although I see no reason why it shouldn't.
-- 
Simon.
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1796895
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux