Hammer to Jewel Upgrade - Extreme OSD Boot Time

Chris Jones <chris.jones@xxxxxx> · Wed, 25 Oct 2017 17:52:13 +0000

After upgrading from CEPH Hammer to Jewel, we are experiencing extremely long osd boot duration.

This long boot time is a huge concern for us and are looking for insight into how we can speed up the boot time.

In Hammer, OSD boot time was approx 3 minutes. After upgrading to Jewel, boot time is between 1 and 3 HOURS.

This was not surprising during initial boot after the upgrade, however we are seeing this occur each time an OSD process is restarted.

This is using ZFS. We added the following configuration to ceph.conf as part of the upgrade to overcome some filesystem startup issues per the recommendations at the following url:

https://github.com/zfsonlinux/zfs/issues/4913

Added ceph.conf configuration:
filestore_max_inline_xattrs = 10
filestore_max_inline_xattr_size = 65536
filestore_max_xattr_value_size = 65536

Example OSD Log (note the long duration at the line containing "osd.191 119292 crush map has features 281819681652736, adjusting msgr requires for osds":

2017-10-24 18:01:18.410249 7f1333d08700  1 leveldb: Generated table #524178: 158056 keys, 1502244 bytes
2017-10-24 18:01:18.805235 7f1333d08700  1 leveldb: Generated table #524179: 266429 keys, 2129196 bytes
2017-10-24 18:01:19.254798 7f1333d08700  1 leveldb: Generated table #524180: 197068 keys, 2128820 bytes
2017-10-24 18:01:20.070109 7f1333d08700  1 leveldb: Generated table #524181: 192675 keys, 2129122 bytes
2017-10-24 18:01:20.947818 7f1333d08700  1 leveldb: Generated table #524182: 196806 keys, 2128945 bytes
2017-10-24 18:01:21.183475 7f1333d08700  1 leveldb: Generated table #524183: 63421 keys, 828081 bytes
2017-10-24 18:01:21.477197 7f1333d08700  1 leveldb: Generated table #524184: 173331 keys, 1348407 bytes
2017-10-24 18:01:21.477226 7f1333d08700  1 leveldb: Compacted 1@2 + 12@3 files => 19838392 bytes
2017-10-24 18:01:21.509952 7f1333d08700  1 leveldb: compacted to: files[ 0 1 66 551 788 0 0 ]
2017-10-24 18:01:21.512235 7f1333d08700  1 leveldb: Delete type=2 #523994
2017-10-24 18:01:23.142853 7f1349d93800  0 filestore(/osd/191) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2017-10-24 18:01:27.927823 7f1349d93800  0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello
2017-10-24 18:01:27.933105 7f1349d93800  0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
2017-10-24 18:01:27.960283 7f1349d93800  0 osd.191 119292 crush map has features 281544803745792, adjusting msgr requires for clients
2017-10-24 18:01:27.960309 7f1349d93800  0 osd.191 119292 crush map has features 281819681652736 was 8705, adjusting msgr requires for mons
2017-10-24 18:01:27.960316 7f1349d93800  0 osd.191 119292 crush map has features 281819681652736, adjusting msgr requires for osds
2017-10-24 23:28:09.694213 7f1349d93800  0 osd.191 119292 load_pgs
2017-10-24 23:28:14.757449 7f1333d08700  1 leveldb: Compacting 1@1 + 13@2 files
2017-10-24 23:28:15.002381 7f1333d08700  1 leveldb: Generated table #524185: 17970 keys, 2128900 bytes
2017-10-24 23:28:15.198899 7f1333d08700  1 leveldb: Generated table #524186: 22386 keys, 2128610 bytes
2017-10-24 23:28:15.337819 7f1333d08700  1 leveldb: Generated table #524187: 3890 keys, 371799 bytes
2017-10-24 23:28:15.693433 7f1333d08700  1 leveldb: Generated table #524188: 21984 keys, 2128947 bytes
2017-10-24 23:28:15.874955 7f1333d08700  1 leveldb: Generated table #524189: 9565 keys, 1207375 bytes
2017-10-24 23:28:16.253599 7f1333d08700  1 leveldb: Generated table #524190: 21999 keys, 2129625 bytes
2017-10-24 23:28:16.576250 7f1333d08700  1 leveldb: Generated table #524191: 21544 keys, 2128033 bytes

Strace on an OSD process during startup reveals what appears to be parsing of objects and calling getxattr. 

The bulk of the time is spent on parsing the objects and performing the getxattr system calls... for example:

(Full lines truncated intentionally for brevity). 
[pid 3068964] getxattr("/osd/174/current/20.6a4s7_head/default.7385.13\...(ommitted)
[pid 3068964] getxattr("/osd/174/current/20.6a4s7_head/default.7385.5\...(ommitted)
[pid 3068964] getxattr("/osd/174/current/20.6a4s7_head/default.7385.5\...(ommitted)

Cluster details:
- 9 hosts (32 cores, 256 GB RAM, Ubuntu 14.04 3.16.0-77-generic, 72 6TB SAS2 drives per host, collocated journals)
- Pre-upgrade: Hammer (ceph version 0.94.6)
- Post-upgrade: Jewel (ceph version 10.2.9)
- object storage use only
- erasure coded (k=7, m=2) .rgw.buckets pool (8192 pgs)
- failure domain of host
- cluster is currently storing approx 500TB over 200 MObjects

--------------------------------------------------
Christopher J. Jones

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com