different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD

Udo Lembke <ulembke@xxxxxxxxxxxx> · Tue, 21 Jul 2015 12:06:56 +0200

Hi all,
we had an ceph cluster with 7 OSD-nodes (Debian Jessie (because patched tcmalloc) with ceph 0.94) which we expand with
one further node.
For this node we use puppet with Debian 7.8, because ceph 0.92.2 doesn't install on Jessie (upgrade 0.94.1 work on the
other nodes but 0.94.2 looks not clean because the package ceph are still 0.94.1).
The ceph.conf is systemwide the same and the OSDs are on all nodes initialized with ceph-deploy (only some exceptions).
All OSDs are used ext4, switched from xfs during the cluster run ceph 0.80.7, filestore xattr use omap = true are used
in ceph.conf.

I'm wondering that the omap-format is different on the nodes.
The new wheezy node use .sst files:
ls -lsa /var/lib/ceph/osd/ceph-92/current/omap/
...
2084 -rw-r--r--   1 root root 2131113 Jul 20 17:45 000098.sst
2084 -rw-r--r--   1 root root 2131913 Jul 20 17:45 000099.sst
2084 -rw-r--r--   1 root root 2130623 Jul 20 17:45 000111.sst
...

Due the jessie nodes use levelDB:
ls -lsa /var/lib/ceph/osd/ceph-1/current/omap/
...

2084 -rw-r--r--   1 root root 2130468 Jul 20 22:33 000080.ldb
2084 -rw-r--r--   1 root root 2130827 Jul 20 22:33 000081.ldb
2084 -rw-r--r--   1 root root 2130171 Jul 20 22:33 000088.ldb
...

On some OSDs I found old .sst files which came out of wheezy/ceph 0.87 times:
ls -lsa /var/lib/ceph/osd/ceph-23/current/omap/*.sst
2096 -rw-r--r-- 1 root root 2142558 Apr  3 15:59 /var/lib/ceph/osd/ceph-23/current/omap/016722.sst
2092 -rw-r--r-- 1 root root 2141968 Apr  3 15:59 /var/lib/ceph/osd/ceph-23/current/omap/016723.sst
2092 -rw-r--r-- 1 root root 2141679 Apr  3 15:59 /var/lib/ceph/osd/ceph-23/current/omap/016724.sst
2096 -rw-r--r-- 1 root root 2142376 Apr  3 15:59 /var/lib/ceph/osd/ceph-23/current/omap/016725.sst
2096 -rw-r--r-- 1 root root 2142227 Apr  3 15:59 /var/lib/ceph/osd/ceph-23/current/omap/016726.sst
2092 -rw-r--r-- 1 root root 2141369 Apr 20 21:23 /var/lib/ceph/osd/ceph-23/current/omap/019470.sst
But much more .ldb-files
ls -lsa /var/lib/ceph/osd/ceph-23/current/omap/*.ldb | wc -l
128

The config shows for OSDs on both nodes (old and new with .sst-files) as backend leveldb:
ceph --admin-daemon /var/run/ceph/ceph-osd.92.asok config show | grep -i omap
    "filestore_omap_backend": "leveldb",
    "filestore_debug_omap_check": "false",
    "filestore_omap_header_cache_size": "1024",

Normaly I would not care about that, but I tried to switch the first OSD-Node to an clean puppet install and see, that
none OSD are started. The error message looks a little bit like http://tracker.ceph.com/issues/11429 but this should not
happens, because the puppet install has ceph 0.94.2.

Error message during start:
cat ceph-osd.0.log
2015-07-20 16:51:29.435081 7fb47b126840  0 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3), process
ceph-osd, pid 9803
2015-07-20 16:51:29.457776 7fb47b126840  0 filestore(/var/lib/ceph/osd/ceph-0) backend generic (magic 0xef53)
2015-07-20 16:51:29.460470 7fb47b126840  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
ioctl is supported and appears to work
2015-07-20 16:51:29.460479 7fb47b126840  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2015-07-20 16:51:29.485120 7fb47b126840  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
syscall(SYS_syncfs, fd) fully supported
2015-07-20 16:51:29.572670 7fb47b126840  0 filestore(/var/lib/ceph/osd/ceph-0) limited size xattrs
2015-07-20 16:51:29.889599 7fb47b126840  0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode:
checkpoint is not enabled
2015-07-20 16:51:31.517179 7fb47b126840  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2015-07-20 16:51:31.552366 7fb47b126840  0 osd.0 151644 crush map has features 2303210029056, adjusting msgr requires
for clients
2015-07-20 16:51:31.552375 7fb47b126840  0 osd.0 151644 crush map has features 2578087936000 was 8705, adjusting msgr
requires for mons
2015-07-20 16:51:31.552382 7fb47b126840  0 osd.0 151644 crush map has features 2578087936000, adjusting msgr requires
for osds
2015-07-20 16:51:31.552394 7fb47b126840  0 osd.0 151644 load_pgs
2015-07-20 16:51:42.682678 7fb47b126840 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*,
spg_t, ceph::bufferlist*)' thread 7fb47b126840 time 2015-07-20 16:51:42.680036
osd/PG.cc: 2825: FAILED assert(values.size() == 2)

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) [0xcdb572]
 2: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0x7b2) [0x908742]
 3: (OSD::load_pgs()+0x734) [0x7e9064]
 4: (OSD::init()+0xdac) [0x7ed8fc]
 5: (main()+0x253e) [0x79069e]
 6: (__libc_start_main()+0xfd) [0x7fb47898fead]
 7: /usr/bin/ceph-osd() [0x7966b9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
...

Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph and I'm back again... but this looks bad
for me.
Unfortunality the system also don't start 9 OSDs as I switched back to the old system-disk... (only three of the big
OSDs are running well)

What is the best solution for that? Empty one node (crush weight 0), fresh reinstall OS/ceph, reinitialise all OSDs?
This will take a long long time, because we use 173TB in this cluster...

I'm happy if somebody has any hints.

Udo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com