Ceph OSDs cause kernel unresponsive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Cephers,

We have encountered kernel hanging issue on our Ceph cluster. Just like http://imgur.com/a/U2Flz , http://imgur.com/a/lyEko or http://imgur.com/a/IGXdu .

We believed it is caused by out of memory, because we observed that when OSDs went crazy, the available memory of each node were decreasing rapidly (from 50% available to lower than 10%). Then the node running Ceph OSD became unresponsive with console showing hung_task_timout or slab_out_of_memory, etc. The only thing we can do then is hard reset the unit.

It is hard to predict when the kernel hanging issue will happen. In my past experiences, it usually happened after a long term benchmark procedure, and followed by a manual trigger like 1) reboot a node 2) restart all OSDs 3) modify CRUSH map.

Currently the cluster is back to normal, but we want to figure out the root cause to avoid happening again. We think the high values of ceph.conf are pretty suspicous, but without code tracing we are hard to realize the impact of the values and the memory consumption.

Many thanks if you have any suggestions.

=================================================================================

Following is our ceph cluster architecture:

OS: Ubuntu 16.04.1 LTS (4.4.0-31-generic #50-Ubuntu x86_64 GNU/Linux)
Ceph: Jewel 10.2.3

3 Ceph Monitors running on 3 dedicated machines
630 Ceph OSDs running on 7 storage machines (each machine has 256GB RAM and 90 units of 8TB hard drives)

There are 4 pools with following settings:
vms     512  pg x 3 replica
images  512  pg x 3 replica
volumes 8192 pg x 3 replica
objects 4096 pg x (17,3) erasure code profile

==> average 173.92 pgs per OSD

We tuned our ceph.conf by referencing many performance tuning resources online ( mainly from slide 38 of https://goo.gl/Idkh41 )

[global]
osd pool default pg num = 4096
osd pool default pgp num = 4096
err to syslog = true
log to syslog = true
osd pool default size = 3
max open files = 131072
fsid = 1c33bf75-e080-4a70-9fd8-860ff216f595
osd crush chooseleaf type = 1

[mon.mon1]
host = mon1
mon addr = 172.20.1.2

[mon.mon2]
host = mon2
mon addr = 172.20.1.3

[mon.mon3]
host = mon3
mon addr = 172.20.1.4

[mon]
mon osd full ratio = 0.85
mon osd nearfull ratio = 0.7
mon osd down out interval = 600
mon osd down out subtree limit = host
mon allow pool delete = true
mon compact on start = true

[osd]
public_network = 172.20.3.1/21
cluster_network = 172.24.0.1/24
osd disk threads = 4
osd mount options xfs = rw,noexec,nodev,noatime,nodiratime,nobarrier,inode64,logbsize=256k
osd crush update on start = false
osd op threads = 20
osd mkfs options xfs = -f -i size=2048
osd max write size = 512
osd mkfs type = xfs
osd journal size = 5120
filestore max inline xattrs = 6
filestore queue committing max bytes = 1048576000
filestore queue committing max ops = 5000
filestore queue max bytes = 1048576000
filestore op threads = 32
filestore max inline xattr size = 254
filestore max sync interval = 15
filestore min sync interval = 10
journal max write bytes = 1048576000
journal max write entries = 1000
journal queue max ops = 3000
journal queue max bytes = 1048576000
ms dispatch throttle bytes = 1048576000
 
Sincerely,
Craig Chi

 

Sent from Synology MailPlus
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux