Re: Long OSD restart after upgrade to 10.2.9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



root@storage07:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

root@storage07:~$ uname -a
Linux storage07 4.4.0-83-generic #106~14.04.1-Ubuntu SMP Mon Jun 26 18:10:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@storage07:~$ dpkg -l | grep leveld
ii libleveldb1:amd64 1.15.0-2 amd64 fast key-value storage library

From /var/log/apt/history.log nothing more was installed or upgraded for the last few months except listed below. Leveldb was not upgraded recently.

Start-Date: 2017-07-15  17:35:25
Commandline: apt upgrade
Install: linux-headers-4.4.0-83-generic:amd64 (4.4.0-83.106~14.04.1, automatic), linux-image-4.4.0-83-generic:amd64 (4.4.0-83.106~14.04.1, automatic), linux-image-3.13.0-123-generic:amd64 (3.13.0-123.172, automatic), linux-image-extra-3.13.0-123-generic:amd64 (3.13.0-123.172, automatic), linux-headers-4.4.0-83:amd64 (4.4.0-83.106~14.04.1, automatic), linux-image-extra-4.4.0-83-generic:amd64 (4.4.0-83.106~14.04.1, automatic), linux-headers-3.13.0-123-generic:amd64 (3.13.0-123.172, automatic), linux-headers-3.13.0-123:amd64 (3.13.0-123.172, automatic) Upgrade: bind9-host:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), python3-problem-report:amd64 (2.14.1-0ubuntu3.23, 2.14.1-0ubuntu3.24), liblwres90:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), libnl-genl-3-200:amd64 (3.2.21-1ubuntu4, 3.2.21-1ubuntu4.1), linux-headers-generic:amd64 (3.13.0.119.129, 3.13.0.123.133), libgnutls-openssl27:amd64 (2.12.23-12ubuntu2.7, 2.12.23-12ubuntu2.8), multiarch-support:amd64 (2.19-0ubuntu6.11, 2.19-0ubuntu6.13), libdns100:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), radosgw:amd64 (10.2.7-1trusty, 10.2.9-1trusty), libisccfg90:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), isc-dhcp-common:amd64 (4.2.4-7ubuntu12.8, 4.2.4-7ubuntu12.10), libbind9-90:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), python-cephfs:amd64 (10.2.7-1trusty, 10.2.9-1trusty), librbd1:amd64 (10.2.7-1trusty, 10.2.9-1trusty), sudo:amd64 (1.8.9p5-1ubuntu1.3, 1.8.9p5-1ubuntu1.4), libradosstriper1:amd64 (10.2.7-1trusty, 10.2.9-1trusty), python3-software-properties:amd64 (0.92.37.7, 0.92.37.8), linux-tools-common:amd64 (3.13.0-119.166, 3.13.0-123.172), libc-dev-bin:amd64 (2.19-0ubuntu6.11, 2.19-0ubuntu6.13), librados2:amd64 (10.2.7-1trusty, 10.2.9-1trusty), libc-bin:amd64 (2.19-0ubuntu6.11, 2.19-0ubuntu6.13), libc6:amd64 (2.19-0ubuntu6.11, 2.19-0ubuntu6.13), linux-generic-lts-xenial:amd64 (4.4.0.78.63, 4.4.0.83.68), dnsutils:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), ceph-base:amd64 (10.2.7-1trusty, 10.2.9-1trusty), libnl-3-200:amd64 (3.2.21-1ubuntu4, 3.2.21-1ubuntu4.1), klibc-utils:amd64 (2.0.3-0ubuntu1.14.04.2, 2.0.3-0ubuntu1.14.04.3), ceph-osd:amd64 (10.2.7-1trusty, 10.2.9-1trusty), gnutls-bin:amd64 (3.0.11+really2.12.23-12ubuntu2.7, 3.0.11+really2.12.23-12ubuntu2.8), libldap-2.4-2:amd64 (2.4.31-1+nmu2ubuntu8.3, 2.4.31-1+nmu2ubuntu8.4), libnss3-nssdb:amd64 (3.28.4-0ubuntu0.14.04.1, 3.28.4-0ubuntu0.14.04.2), linux-headers-generic-lts-xenial:amd64 (4.4.0.78.63, 4.4.0.83.68), ntp:amd64 (4.2.6.p5+dfsg-3ubuntu2.14.04.10, 4.2.6.p5+dfsg-3ubuntu2.14.04.11), ceph-mds:amd64 (10.2.7-1trusty, 10.2.9-1trusty), ceph:amd64 (10.2.7-1trusty, 10.2.9-1trusty), ceph-common:amd64 (10.2.7-1trusty, 10.2.9-1trusty), librgw2:amd64 (10.2.7-1trusty, 10.2.9-1trusty), vlan:amd64 (1.9-3ubuntu10.1, 1.9-3ubuntu10.4), isc-dhcp-client:amd64 (4.2.4-7ubuntu12.8, 4.2.4-7ubuntu12.10), libnss3:amd64 (3.28.4-0ubuntu0.14.04.1, 3.28.4-0ubuntu0.14.04.2), python-rbd:amd64 (10.2.7-1trusty, 10.2.9-1trusty), libgnutls26:amd64 (2.12.23-12ubuntu2.7, 2.12.23-12ubuntu2.8), apport:amd64 (2.14.1-0ubuntu3.23, 2.14.1-0ubuntu3.24), libklibc:amd64 (2.0.3-0ubuntu1.14.04.2, 2.0.3-0ubuntu1.14.04.3), ceph-mon:amd64 (10.2.7-1trusty, 10.2.9-1trusty), libcephfs1:amd64 (10.2.7-1trusty, 10.2.9-1trusty), linux-image-generic-lts-xenial:amd64 (4.4.0.78.63, 4.4.0.83.68), python3-apport:amd64 (2.14.1-0ubuntu3.23, 2.14.1-0ubuntu3.24), software-properties-common:amd64 (0.92.37.7, 0.92.37.8), linux-libc-dev:amd64 (3.13.0-119.166, 3.13.0-123.172), libtasn1-6:amd64 (3.4-3ubuntu0.4, 3.4-3ubuntu0.5), linux-image-generic:amd64 (3.13.0.119.129, 3.13.0.123.133), libisccc90:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), libgcrypt11:amd64 (1.5.3-2ubuntu4.4, 1.5.3-2ubuntu4.5), libc6-dev:amd64 (2.19-0ubuntu6.11, 2.19-0ubuntu6.13), libisc95:amd64 (9.9.5.dfsg-3ubuntu0.14, 9.9.5.dfsg-3ubuntu0.15), python-rados:amd64 (10.2.7-1trusty, 10.2.9-1trusty), ntpdate:amd64 (4.2.6.p5+dfsg-3ubuntu2.14.04.10, 4.2.6.p5+dfsg-3ubuntu2.14.04.11), linux-generic:amd64 (3.13.0.119.129, 3.13.0.123.133)
End-Date: 2017-07-15  17:39:43

root@storage07:~$ ceph --admin-daemon /var/run/ceph/ceph-osd.195.asok config show | grep leveldb
    "debug_leveldb": "20\/20",
    "mon_keyvaluedb": "leveldb",
    "osd_compact_leveldb_on_mount": "false",
    "leveldb_log_to_ceph_log": "true",
    "leveldb_write_buffer_size": "33554432",
    "leveldb_cache_size": "536870912",
    "leveldb_block_size": "0",
    "leveldb_bloom_size": "0",
    "leveldb_max_open_files": "0",
    "leveldb_compression": "true",
    "leveldb_paranoid": "false",
    "leveldb_log": "",
    "leveldb_compact_on_mount": "false",
    "filestore_omap_backend": "leveldb",


Logs with debug 20 leveldb, osd, filestore:

2017-07-19 09:01:48.580299 7fa45ef68840 5 filestore(/var/lib/ceph/osd/ceph-195) basedir /var/lib/ceph/osd/ceph-195 journal /var/lib/ceph/osd/ceph-195/journal 2017-07-19 09:01:48.580319 7fa45ef68840 10 filestore(/var/lib/ceph/osd/ceph-195) mount fsid is 44cb829d-8f11-4b9c-b0a4-50b3230a83a2 2017-07-19 09:01:48.580523 7fa45ef68840 10 filestore(/var/lib/ceph/osd/ceph-195) version_stamp_is_valid was 4 vs target 4 2017-07-19 09:01:48.582436 7fa45ef68840 0 filestore(/var/lib/ceph/osd/ceph-195) backend xfs (magic 0x58465342) 2017-07-19 09:01:48.585766 7fa45ef68840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-195) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2017-07-19 09:01:48.585775 7fa45ef68840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-195) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-07-19 09:01:48.585794 7fa45ef68840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-195) detect_features: splice is supported 2017-07-19 09:01:48.736795 7fa45ef68840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-195) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-07-19 09:01:48.736933 7fa45ef68840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-195) detect_feature: extsize is disabled by conf 2017-07-19 09:01:48.737359 7fa45ef68840 5 filestore(/var/lib/ceph/osd/ceph-195) mount op_seq is 148692887
2017-07-19 09:01:48.775603 7fa45ef68840  1 leveldb: Recovering log #77851
2017-07-19 09:01:48.963421 7fa45ef68840 1 leveldb: Level-0 table #77854: started 2017-07-19 09:01:49.062159 7fa45ef68840 1 leveldb: Level-0 table #77854: 3929319 bytes OK
2017-07-19 09:01:49.121046 7fa45ef68840  1 leveldb: Delete type=0 #77851

2017-07-19 09:01:49.123593 7fa45ef68840  1 leveldb: Delete type=3 #77076

2017-07-19 09:01:49.124087 7fa45138f700 1 leveldb: Compacting 4@0 + 6@1 files 2017-07-19 09:01:49.537744 7fa45138f700 1 leveldb: Generated table #77856: 118299 keys, 2132036 bytes 2017-07-19 09:01:49.729436 7fa45138f700 1 leveldb: Generated table #77857: 2575 keys, 150474 bytes 2017-07-19 09:01:49.954543 7fa45138f700 1 leveldb: Generated table #77858: 105450 keys, 2133258 bytes 2017-07-19 09:01:50.179849 7fa45138f700 1 leveldb: Generated table #77859: 97031 keys, 2132683 bytes 2017-07-19 09:01:50.379805 7fa45138f700 1 leveldb: Generated table #77860: 76399 keys, 1859291 bytes 2017-07-19 09:01:50.588347 7fa45138f700 1 leveldb: Generated table #77861: 103092 keys, 2132002 bytes 2017-07-19 09:01:50.738353 7fa45138f700 1 leveldb: Generated table #77862: 58698 keys, 1537699 bytes 2017-07-19 09:01:50.738970 7fa45138f700 1 leveldb: Compacted 4@0 + 6@1 files => 12077443 bytes 2017-07-19 09:01:50.829871 7fa45138f700 1 leveldb: compacted to: files[ 0 7 50 553 948 0 0 ] 2017-07-19 09:01:50.830829 7fa45138f700 1 leveldb: Expanding@1 1+15 (2132036+22021286 bytes) to 2+15 (2282510+22021286 bytes)










On 19.07.2017 00:31, Josh Durgin wrote:
On 07/17/2017 10:04 PM, Anton Dmitriev wrote:
My cluster stores more than 1.5 billion objects in RGW, cephfs I dont use. Bucket index pool stored on separate SSD placement. But compaction occurs on all OSD, also on those, which doesn`t contain bucket indexes. After restarting 5 times every OSD nothing changed, each of them doing comapct again and again.

As an example omap dir size on one of OSDs, which doesnt contain bucket indexes.

root@storage01:/var/lib/ceph/osd/ceph-0/current/omap$ ls -l | wc -l
1455
root@storage01:/var/lib/ceph/osd/ceph-0/current/omap$ du -hd1
2,8G

Not so big at first look.

That is smaller than I'd expect to get long delays from compaction.
Could you provide more details on your environment - distro version and leveldb version? Has leveldb been updated recently? Perhaps there's some commonality between setups hitting this.

There weren't any changes in the way ceph was using leveldb from 10.2.7
to 10.2.9 that I could find.

Josh

On 17.07.2017 22:03, Josh Durgin wrote:
Both of you are seeing leveldb perform compaction when the osd starts
up. This can take a while for large amounts of omap data (created by
things like cephfs directory metadata or rgw bucket indexes).

The 'leveldb_compact_on_mount' option wasn't changed in 10.2.9, but leveldb will compact automatically if there is enough work to do.

Does restarting an OSD affected by this with 10.2.9 again after it's
completed compaction still have these symptoms?

Josh

On 07/17/2017 05:57 AM, Lincoln Bryant wrote:
Hi Anton,

We observe something similar on our OSDs going from 10.2.7 to 10.2.9 (see thread "some OSDs stuck down after 10.2.7 -> 10.2.9 update"). Some of our OSDs are not working at all on 10.2.9 or die with suicide timeouts. Those that come up/in take a very long time to boot up. Seems to not affect every OSD in our case though.

--Lincoln

On 7/17/2017 1:29 AM, Anton Dmitriev wrote:
During start it consumes ~90% CPU, strace shows, that OSD process doing something with LevelDB.
Compact is disabled:
root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:~$ cat /etc/ceph/ceph.conf | grep compact
#leveldb_compact_on_mount = true

But with debug_leveldb=20 I see, that compaction is running, but why?

2017-07-17 09:27:37.394008 7f4ed2293700 1 leveldb: Compacting 1@1 + 12@2 files 2017-07-17 09:27:37.593890 7f4ed2293700 1 leveldb: Generated table #76778: 277817 keys, 2125970 bytes 2017-07-17 09:27:37.718954 7f4ed2293700 1 leveldb: Generated table #76779: 221451 keys, 2124338 bytes 2017-07-17 09:27:37.777362 7f4ed2293700 1 leveldb: Generated table #76780: 63755 keys, 809913 bytes 2017-07-17 09:27:37.919094 7f4ed2293700 1 leveldb: Generated table #76781: 231475 keys, 2026376 bytes 2017-07-17 09:27:38.035906 7f4ed2293700 1 leveldb: Generated table #76782: 190956 keys, 1573332 bytes 2017-07-17 09:27:38.127597 7f4ed2293700 1 leveldb: Generated table #76783: 148675 keys, 1260956 bytes 2017-07-17 09:27:38.286183 7f4ed2293700 1 leveldb: Generated table #76784: 294105 keys, 2123438 bytes 2017-07-17 09:27:38.469562 7f4ed2293700 1 leveldb: Generated table #76785: 299617 keys, 2124267 bytes 2017-07-17 09:27:38.619666 7f4ed2293700 1 leveldb: Generated table #76786: 277305 keys, 2124936 bytes 2017-07-17 09:27:38.711423 7f4ed2293700 1 leveldb: Generated table #76787: 110536 keys, 951545 bytes 2017-07-17 09:27:38.869917 7f4ed2293700 1 leveldb: Generated table #76788: 296199 keys, 2123506 bytes 2017-07-17 09:27:39.028395 7f4ed2293700 1 leveldb: Generated table #76789: 248634 keys, 2096715 bytes 2017-07-17 09:27:39.028414 7f4ed2293700 1 leveldb: Compacted 1@1 + 12@2 files => 21465292 bytes 2017-07-17 09:27:39.053288 7f4ed2293700 1 leveldb: compacted to: files[ 0 0 48 549 948 0 0 ] 2017-07-17 09:27:39.054014 7f4ed2293700 1 leveldb: Delete type=2 #76741

Strace:

open("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", O_RDONLY) = 18 stat("/var/lib/ceph/osd/ceph-195/current/omap/043788.ldb", {st_mode=S_IFREG|0644, st_size=2154394, ...}) = 0
mmap(NULL, 2154394, PROT_READ, MAP_SHARED, 18, 0) = 0x7f96a67a0000
close(18)                               = 0
brk(0x55d156640000)                     = 0x55d156640000
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [PIPE], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], ~[KILL STOP RTMIN RT_1], 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
munmap(0x7f96fca08000, 2125056)




On 17.07.2017 09:13, Anton Dmitriev wrote:
Thanks for reply.
I restarted OSD with debug_ms = 1/1 and debug_osd = 20/20.

Look at this:
2017-07-17 08:57:52.077481 7f4db319c840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_feature: extsize is disabled by conf 2017-07-17 09:04:04.345065 7f4db319c840 0 filestore(/var/lib/ceph/osd/ceph-167) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled

Nothing is happening for a 6 minutes. Maybe try another debug option? Maybe debug_filestore ?


2017-07-17 08:57:32.072881 7f4ca5aaa840 10 -- :/6773 wait: closing pipes
2017-07-17 08:57:32.072882 7f4ca5aaa840 10 -- :/6773 reaper
2017-07-17 08:57:32.072884 7f4ca5aaa840 10 -- :/6773 reaper done
2017-07-17 08:57:32.072886 7f4ca5aaa840 10 -- :/6773 wait: waiting for pipes to close
2017-07-17 08:57:32.072888 7f4ca5aaa840 10 -- :/6773 wait: done.
2017-07-17 08:57:32.072889 7f4ca5aaa840 1 -- :/6773 shutdown complete. 2017-07-17 08:57:52.015484 7f4db319c840 0 set uid:gid to 4402:4402 (ceph:ceph) 2017-07-17 08:57:52.015493 7f4db319c840 0 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd, pid 3966896 2017-07-17 08:57:52.015664 7f4db319c840 5 object store type is filestore 2017-07-17 08:57:52.016680 7f4db319c840 1 -- 10.17.12.130:0/0 learned my addr 10.17.12.130:0/0 2017-07-17 08:57:52.016688 7f4db319c840 1 accepter.accepter.bind my_inst.addr is 10.17.12.130:6867/3966896 need_addr=0 2017-07-17 08:57:52.016721 7f4db319c840 1 -- 10.17.27.14:0/0 learned my addr 10.17.27.14:0/0 2017-07-17 08:57:52.016725 7f4db319c840 1 accepter.accepter.bind my_inst.addr is 10.17.27.14:6847/3966896 need_addr=0 2017-07-17 08:57:52.016750 7f4db319c840 1 -- 10.17.27.14:0/0 learned my addr 10.17.27.14:0/0 2017-07-17 08:57:52.016754 7f4db319c840 1 accepter.accepter.bind my_inst.addr is 10.17.27.14:6848/3966896 need_addr=0 2017-07-17 08:57:52.016783 7f4db319c840 1 -- 10.17.12.130:0/0 learned my addr 10.17.12.130:0/0 2017-07-17 08:57:52.016787 7f4db319c840 1 accepter.accepter.bind my_inst.addr is 10.17.12.130:6868/3966896 need_addr=0 2017-07-17 08:57:52.016789 7f4db319c840 0 pidfile_write: ignore empty --pid-file 2017-07-17 08:57:52.022153 7f4db319c840 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 2017-07-17 08:57:52.024194 7f4db319c840 10 load: jerasure load: lrc load: isa 2017-07-17 08:57:52.024388 7f4db319c840 1 -- 10.17.12.130:6867/3966896 messenger.start
2017-07-17 08:57:52.024418 7f4db319c840  1 -- :/0 messenger.start
2017-07-17 08:57:52.024439 7f4db319c840 1 -- 10.17.12.130:6868/3966896 messenger.start 2017-07-17 08:57:52.024462 7f4db319c840 1 -- 10.17.27.14:6848/3966896 messenger.start 2017-07-17 08:57:52.024481 7f4db319c840 1 -- 10.17.27.14:6847/3966896 messenger.start
2017-07-17 08:57:52.024501 7f4db319c840  1 -- :/0 messenger.start
2017-07-17 08:57:52.024574 7f4db319c840 2 osd.167 0 mounting /var/lib/ceph/osd/ceph-167 /var/lib/ceph/osd/ceph-167/journal 2017-07-17 08:57:52.025031 7f4db319c840 0 filestore(/var/lib/ceph/osd/ceph-167) backend xfs (magic 0x58465342) 2017-07-17 08:57:52.032406 7f4db319c840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2017-07-17 08:57:52.032415 7f4db319c840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-07-17 08:57:52.032430 7f4db319c840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_features: splice is supported 2017-07-17 08:57:52.077404 7f4db319c840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-07-17 08:57:52.077481 7f4db319c840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_feature: extsize is disabled by conf 2017-07-17 09:04:04.345065 7f4db319c840 0 filestore(/var/lib/ceph/osd/ceph-167) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2017-07-17 09:04:05.751209 7f4db319c840  2 osd.167 0 boot
2017-07-17 09:04:05.751385 7f4db319c840 20 osd.167 0 configured osd_max_object_name[space]_len looks ok 2017-07-17 09:04:05.751468 7f4db319c840 10 osd.167 0 read_superblock sb(2dc4a5fc-8acd-480e-a444-f091d02271b8 osd.167 559efd6b-7a47-4400-9645-39e4c8b210e9 e224604 [223964,224604] lci=[224038,224604])
2017-07-17 09:04:05.775781 7f4db319c840 10 open_all_classes
2017-07-17 09:04:05.775833 7f4db319c840 10 open_all_classes found cephfs 2017-07-17 09:04:05.775853 7f4db319c840 10 _get_class adding new class name cephfs 0x55aadeb42548 2017-07-17 09:04:05.775858 7f4db319c840 10 _load_class cephfs from /usr/lib/rados-classes/libcls_cephfs.so 2017-07-17 09:04:05.776094 7f4db319c840 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan 2017-07-17 09:04:05.776101 7f4db319c840 10 register_class cephfs status 3



On 17.07.2017 08:41, Irek Fasikhov wrote:
Hi, Anton.
You need to run the OSD with debug_ms = 1/1 and debug_osd = 20/20 for detailed information.

2017-07-17 8:26 GMT+03:00 Anton Dmitriev <tech@xxxxxxxxxx <mailto:tech@xxxxxxxxxx>>:

    Hi, all!

After upgrading from 10.2.7 to 10.2.9 I see that restarting osds by 'restart ceph-osd id=N' or 'restart ceph-osd-all' takes about
    10 minutes for getting OSD from DOWN to UP. The same situation
    on all 208 OSDs on 7 servers.

    Also very long OSD start after rebooting servers.

    Before upgrade it took no more than 2 minutes.

    Does anyone has the same situation like mine?


    2017-07-17 08:07:26.895600 7fac2d656840  0 set uid:gid to
    4402:4402 (ceph:ceph)
    2017-07-17 08:07:26.895615 7fac2d656840  0 ceph version 10.2.9
    (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0), process ceph-osd,
    pid 197542
2017-07-17 08:07:26.897018 7fac2d656840 0 pidfile_write: ignore
    empty --pid-file
    2017-07-17 08:07:26.906489 7fac2d656840  0
filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
    2017-07-17 08:07:26.917074 7fac2d656840  0
    genericfilestorebackend(/var/lib/ceph/osd/ceph-0)
detect_features: FIEMAP ioctl is disabled via 'filestore fiemap'
    config option
    2017-07-17 08:07:26.917092 7fac2d656840  0
    genericfilestorebackend(/var/lib/ceph/osd/ceph-0)
    detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore
    seek data hole' config option
    2017-07-17 08:07:26.917112 7fac2d656840  0
    genericfilestorebackend(/var/lib/ceph/osd/ceph-0)
    detect_features: splice is supported
    2017-07-17 08:07:27.037031 7fac2d656840  0
    genericfilestorebackend(/var/lib/ceph/osd/ceph-0)
detect_features: syncfs(2) syscall fully supported (by glibc and
    kernel)
    2017-07-17 08:07:27.037154 7fac2d656840  0
    xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature:
    extsize is disabled by conf
    2017-07-17 08:15:17.839072 7fac2d656840  0
    filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD
    journal mode: checkpoint is not enabled
    2017-07-17 08:15:20.150446 7fac2d656840  0 <cls>
    cls/hello/cls_hello.cc:305: loading cls_hello
    2017-07-17 08:15:20.152483 7fac2d656840  0 <cls>
    cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
    2017-07-17 08:15:20.210428 7fac2d656840  0 osd.0 224167 crush
map has features 2200130813952, adjusting msgr requires for clients
    2017-07-17 08:15:20.210443 7fac2d656840  0 osd.0 224167 crush
map has features 2200130813952 was 8705, adjusting msgr requires
    for mons
    2017-07-17 08:15:20.210448 7fac2d656840  0 osd.0 224167 crush
map has features 2200130813952, adjusting msgr requires for osds 2017-07-17 08:15:58.902173 7fac2d656840 0 osd.0 224167 load_pgs 2017-07-17 08:16:19.083406 7fac2d656840 0 osd.0 224167 load_pgs
    opened 242 pgs
    2017-07-17 08:16:19.083969 7fac2d656840  0 osd.0 224167 using 0
    op queue with priority op cut off at 64.
    2017-07-17 08:16:19.109547 7fac2d656840 -1 osd.0 224167
    log_to_monitors {default=true}
    2017-07-17 08:16:19.522448 7fac2d656840  0 osd.0 224167 done
    with init, starting boot process

    --     Dmitriev Anton

    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



--
Dmitriev Anton


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Dmitriev Anton


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Dmitriev Anton

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux