Re: Luminous, OSDs down: "osd init failed" and "failed to load OSD map for epoch ... got 0 bytes"

Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> · Fri, 22 May 2020 22:43:09 +0200

Hallo Dan, thanks for your patience!

Il 5/22/2020 1:57 PM, Dan van der Ster ha scritto:
The procedure to overwrite a corrupted osdmap on a given osd is
described at http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html
I wouldn't do that type of low level manipulation just yet -- better
to understand the root cause of he corruptions first before
potentially making things worse.

Ok.

There is also this issue which seems related:
https://tracker.ceph.com/issues/24423
(It has a fix in mimic and nautilus).

Could you share some more logs e.g. with the full backtrace from the
time they first crashed, and now failing to start. And maybe
/var/log/messages shows crc mismatches?

Please find at https://pastebin.com/UjfmPT37 the first occurrence of 
problem on OSD 72, and at https://pastebin.com/paumyAZ1
what happens when I start ceph-osd@72 after leaving it stopped for one 
minute (the same open_db failure is the only thing I find in 
/var/log/messages).

On a different OSD (63), same kind as 72, I see in /var/log/messages:

May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.087529 
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad 
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 
0x1d9fdf3, device location [0x3105041f000~1000], logical extent 
0x3cf000~1000, object 
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.088194 
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad 
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 
0x1d9fdf3, device location [0x3105041f000~1000], logical extent 
0x3cf000~1000, object 
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.088856 
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad 
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 
0x1d9fdf3, device location [0x3105041f000~1000], logical extent 
0x3cf000~1000, object 
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.089659 
7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad 
crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 
0x1d9fdf3, device location [0x3105041f000~1000], logical extent 
0x3cf000~1000, object 
#1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head#
May 21 13:20:40 r1srv07 ceph-osd: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueStore.cc: 
In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, 
BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, 
ceph::buffer::list::iterator&, BlueStore::WriteContext*)' thread 
7fa90a000700 time 2020-05-21 13:20:40.089716
May 21 13:20:40 r1srv07 ceph-osd: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueStore.cc: 
10176: FAILED assert(r >= 0 && r <= (int)head_read)

For this same OSD 63 I tried ceph-bluestore-tool:

[root@r1srv07.pa1 ceph]# time ceph-bluestore-tool --path 
/var/lib/ceph/osd/cephpa1-63 --deep 1 --command repair ; date
2020-05-22 16:12:19.643443 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device 
location [0x10000~1000], logical extent 0x0~1000, object 
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.644037 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device 
location [0x10000~1000], logical extent 0x0~1000, object 
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.644542 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device 
location [0x10000~1000], logical extent 0x0~1000, object 
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.645019 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device 
location [0x10000~1000], logical extent 0x0~1000, object 
#-1:7b3f43c4:::osd_superblock:0#
2020-05-22 16:12:19.645050 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) fsck error: 
#-1:7b3f43c4:::osd_superblock:0# error during read:  0~21a (5) 
Input/output error
2020-05-22 16:20:05.454969 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, 
device location [0x304e857c000~1000], logical extent 0x36c000~1000, 
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.509771 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, 
device location [0x304e857c000~1000], logical extent 0x36c000~1000, 
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.577770 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, 
device location [0x304e857c000~1000], logical extent 0x36c000~1000, 
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.606838 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, 
device location [0x304e857c000~1000], logical extent 0x36c000~1000, 
object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head#
2020-05-22 16:20:05.607031 7ff93afaeec0 -1 
bluestore(/var/lib/ceph/osd/cephpa1-63) fsck error: 
#1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# error 
during read:  0~400000 (5) Input/output error
.....
still going on, but I am not confident it will end up in anything good, 
will see.

  Thanks!

				Fulvio

-- dan

On Fri, May 22, 2020 at 1:02 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:

Hallo Dan, thanks for your reply! Very good to know about compression...
will not try to use it before upgrading to Nautilus.

Problem is, I did not activate it on this cluster (see below).
Moreover, that would only account for the issue on disks dedicated to
object storage, if I understand it correctly.

The symptoms of your "*Second problem*" look similar, failing to load
an osdmap. TBH though, BTW, I believe that that lz4 bug might only
corrupt an osdmap when compression is enabled for the whole OSD, i.e.
with bluestore_compression_mode=aggressive, not when compression is
enabled just for specific pools.
Alas, if you didn't have that enabled, then your bug must be something else.

Another important info which I forgot in the previous message: my SAN is
actually composed of 6 independent chains, such that a glitch at the SAN
level is hardly the culprit, while something along the lines of the bug
you pointed me to sounds more reasonable.

Also, how to deal with the "failed to load with the OSD map"?

    Thanks!

                         Fulvio

[root@r1srv05.pa1 ~]# grep compress /etc/ceph/cephpa1.conf
[root@r1srv05.pa1 ~]#
[root@r1srv05.pa1 ~]# ceph --cluster cephpa1 osd pool ls | sort | xargs
-i ceph --cluster ceph osd pool get {} compression_mode
Error ENOENT: option 'compression_mode' is not set on pool 'cephfs_data'
Error ENOENT: option 'compression_mode' is not set on pool 'cephfs_metadata'
Error ENOENT: option 'compression_mode' is not set on pool
'cinder-ceph-ec-pa1-cl1'
Error ENOENT: option 'compression_mode' is not set on pool
'cinder-ceph-ec-pa1-cl1-cache'
Error ENOENT: option 'compression_mode' is not set on pool
'cinder-ceph-pa1-cl1'
Error ENOENT: option 'compression_mode' is not set on pool
'cinder-ceph-pa1-devel'
Error ENOENT: option 'compression_mode' is not set on pool
'cinder-ceph-rr-pa1-cl1'
Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.buckets.data'
Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.buckets.index'
Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.control'
Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.data.root'
Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.gc'
Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.log'
Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.users.keys'
Error ENOENT: option 'compression_mode' is not set on pool
'default.rgw.users.uid'
Error ENOENT: option 'compression_mode' is not set on pool 'ec-pool-fg'
Error ENOENT: option 'compression_mode' is not set on pool 'glance-ct1-cl1'
Error ENOENT: option 'compression_mode' is not set on pool 'glance-pa1-cl1'
Error ENOENT: option 'compression_mode' is not set on pool
'glance-pa1-devel'
Error ENOENT: option 'compression_mode' is not set on pool 'gnocchi-pa1-cl1'
Error ENOENT: option 'compression_mode' is not set on pool 'iscsivolumes'
Error ENOENT: option 'compression_mode' is not set on pool 'k8s'
Error ENOENT: option 'compression_mode' is not set on pool 'rbd'
Error ENOENT: option 'compression_mode' is not set on pool '.rgw.root'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.intent-log'
Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.log'
Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw.buckets'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw.buckets.extra'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw.buckets.index'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw.control'
Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.gc'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.rgw.root'
Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.usage'
Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.users'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.users.email'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.users.swift'
Error ENOENT: option 'compression_mode' is not set on pool
'testrgw.users.uid'

Il 5/22/2020 8:50 AM, Dan van der Ster ha scritto:
Hi Fulvio,

The symptom of several osd's all asserting at the same time in
OSDMap::get_map really sounds like this bug:
https://tracker.ceph.com/issues/39525

lz4 compression is buggy on CentOS 7 and Ubuntu 18.04 -- you need to
disable compression or use a different algorithm. Mimic and nautilus
will get a workaround, but it's not planned to be backported to
luminous.

-- Dan

On Thu, May 21, 2020 at 11:18 PM Fulvio Galeazzi
<fulvio.galeazzi@xxxxxxx> wrote:

Hallo all,
        hope you can help me with very strange problems which arose
suddenly today. Tried to search, also in this mailing list, but could
not find anything relevant.

At some point today, without any action from my side, I noticed some
OSDs in my production cluster would go down and never come up.
I am on Luminous 12.2.13, CentOS7, kernel 3.10: my setup is non-standard
as OSD disks are served off a SAN (which is for sure OK now, although I
cannot exclude some glitch).
Tried to reboot OSD servers a few times, ran "activate --all", added
bluestore_ignore_data_csum=true in the [osd] section in ceph.conf...
the number of "down" OSDs changed for a while but now seems rather stable.

There are actually two classes of problems (bit more details right below):
- ERROR: osd init failed: (5) Input/output error
- failed to load OSD map for epoch 141282, got 0 bytes

*First problem*
This affects 50 OSDs (all disks of this kind, on all but one server):
these OSDs are reserved for object storage but I am not yet using them
so I may in principle recreate them. But would be interested in
understanding what the problem is, and learn how to solve it for future
reference.
Here is what I see in logs:
.....
2020-05-21 21:17:48.661348 7fa2e9a95ec0  1 bluefs add_block_device bdev
1 path /var/lib/ceph/osd/cephpa1-72/block size 14.5TiB
2020-05-21 21:17:48.661428 7fa2e9a95ec0  1 bluefs mount
2020-05-21 21:17:48.662040 7fa2e9a95ec0  1 bluefs _init_alloc id 1
alloc_size 0x10000 size 0xe83a3400000
2020-05-21 21:52:43.858464 7fa2e9a95ec0 -1 bluefs mount failed to replay
log: (5) Input/output error
2020-05-21 21:52:43.858589 7fa2e9a95ec0  1 fbmap_alloc 0x55c6bba92e00
shutdown
2020-05-21 21:52:43.858728 7fa2e9a95ec0 -1
bluestore(/var/lib/ceph/osd/cephpa1-72) _open_db failed bluefs mount:
(5) Input/output error
2020-05-21 21:52:43.858790 7fa2e9a95ec0  1 bdev(0x55c6bbdb6600
/var/lib/ceph/osd/cephpa1-72/block) close
2020-05-21 21:52:44.103536 7fa2e9a95ec0  1 bdev(0x55c6bbdb8600
/var/lib/ceph/osd/cephpa1-72/block) close
2020-05-21 21:52:44.352899 7fa2e9a95ec0 -1 osd.72 0 OSD:init: unable to
mount object store
2020-05-21 21:52:44.352956 7fa2e9a95ec0 -1 ESC[0;31m ** ERROR: osd init
failed: (5) Input/output errorESC[0m

*Second problem*
This affects 11 OSDs, which I use *in production* for Cinder block
storage: looks like all PGs for this pool are currently OK.
Here is the excerpt from the logs.
.....
        -5> 2020-05-21 20:52:06.756469 7fd2ccc19ec0  0 _get_class not
permitted to load kvs
        -4> 2020-05-21 20:52:06.759686 7fd2ccc19ec0  1 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/rgw/cls_rgw.cc:3869:

Loaded rgw class!
        -3> 2020-05-21 20:52:06.760021 7fd2ccc19ec0  1 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/log/cls_log.cc:299:

Loaded log class!
        -2> 2020-05-21 20:52:06.760730 7fd2ccc19ec0  1 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/replica_log/cls_replica_log.cc:135:

Loaded replica log class!
        -1> 2020-05-21 20:52:06.760873 7fd2ccc19ec0 -1 osd.63 0 failed to
load OSD map for epoch 141282, got 0 bytes
         0> 2020-05-21 20:52:06.763277 7fd2ccc19ec0 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/osd/OSD.h:

In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fd2ccc19ec0
time 2020-05-21 20:52:06.760916
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/osd/OSD.h:

994: FAILED assert(ret)

      Has anyone any idea how I could fix these problems, or what I could
do to try and shed some light? And also, what caused them, and whether
there is some magic configuration flag I could use to protect my cluster?

      Thanks a lot for your help!

                Fulvio

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250

--
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250

Attachment:
smime.p7s

Description: Firma crittografica S/MIME
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx