Hallo Dan, all. My attempt with ceph-bluestore-tool did not lead to a working OSD.So I decided to re-create all OSDs, as they were quite many and my cluster was rather unbalanced. Too bad I could not get any insight as to what caused the issue on the OSDs for object storage: however, I will update to Nautilus in 1 month or so, so I decided to consider it as "history".
Thanks again Dan for your help! Fulvio Il 5/22/2020 10:43 PM, Fulvio Galeazzi ha scritto:
Hallo Dan, thanks for your patience! Il 5/22/2020 1:57 PM, Dan van der Ster ha scritto:The procedure to overwrite a corrupted osdmap on a given osd isdescribed at http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.htmlI wouldn't do that type of low level manipulation just yet -- better to understand the root cause of he corruptions first before potentially making things worse.Ok.There is also this issue which seems related: https://tracker.ceph.com/issues/24423 (It has a fix in mimic and nautilus). Could you share some more logs e.g. with the full backtrace from the time they first crashed, and now failing to start. And maybe /var/log/messages shows crc mismatches?Please find at https://pastebin.com/UjfmPT37 the first occurrence of problem on OSD 72, and at https://pastebin.com/paumyAZ1 what happens when I start ceph-osd@72 after leaving it stopped for one minute (the same open_db failure is the only thing I find in /var/log/messages).On a different OSD (63), same kind as 72, I see in /var/log/messages:May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.087529 7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 0x1d9fdf3, device location [0x3105041f000~1000], logical extent 0x3cf000~1000, object #1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head# May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.088194 7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 0x1d9fdf3, device location [0x3105041f000~1000], logical extent 0x3cf000~1000, object #1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head# May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.088856 7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 0x1d9fdf3, device location [0x3105041f000~1000], logical extent 0x3cf000~1000, object #1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head# May 21 13:20:40 r1srv07 ceph-osd: 2020-05-21 13:20:40.089659 7fa90a000700 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x4f000, got 0x69b740ce, expected 0x1d9fdf3, device location [0x3105041f000~1000], logical extent 0x3cf000~1000, object #1:0d1f4ad3:::rbd_data.71b487741226bb.0000000000003d1a:head# May 21 13:20:40 r1srv07 ceph-osd: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_do_write_small(BlueStore::TransContext*, BlueStore::CollectionRef&, BlueStore::OnodeRef, uint64_t, uint64_t, ceph::buffer::list::iterator&, BlueStore::WriteContext*)' thread 7fa90a000700 time 2020-05-21 13:20:40.089716 May 21 13:20:40 r1srv07 ceph-osd: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/os/bluestore/BlueStore.cc: 10176: FAILED assert(r >= 0 && r <= (int)head_read)For this same OSD 63 I tried ceph-bluestore-tool:[root@r1srv07.pa1 ceph]# time ceph-bluestore-tool --path /var/lib/ceph/osd/cephpa1-63 --deep 1 --command repair ; date 2020-05-22 16:12:19.643443 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2020-05-22 16:12:19.644037 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2020-05-22 16:12:19.644542 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2020-05-22 16:12:19.645019 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x1ff97a1a, expected 0xb0ba2652, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2020-05-22 16:12:19.645050 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read: 0~21a (5) Input/output error 2020-05-22 16:20:05.454969 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, device location [0x304e857c000~1000], logical extent 0x36c000~1000, object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# 2020-05-22 16:20:05.509771 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, device location [0x304e857c000~1000], logical extent 0x36c000~1000, object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# 2020-05-22 16:20:05.577770 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, device location [0x304e857c000~1000], logical extent 0x36c000~1000, object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# 2020-05-22 16:20:05.606838 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x6c000, got 0x513aee13, expected 0xacd4bf07, device location [0x304e857c000~1000], logical extent 0x36c000~1000, object #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# 2020-05-22 16:20:05.607031 7ff93afaeec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-63) fsck error: #1:01fdf0e0:::rbd_data.0011e419495cff.00000000000000a0:head# error during read: 0~400000 (5) Input/output error.....still going on, but I am not confident it will end up in anything good, will see.Thanks! Fulvio-- danOn Fri, May 22, 2020 at 1:02 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:Hallo Dan, thanks for your reply! Very good to know about compression... will not try to use it before upgrading to Nautilus. Problem is, I did not activate it on this cluster (see below). Moreover, that would only account for the issue on disks dedicated to object storage, if I understand it correctly.The symptoms of your "*Second problem*" look similar, failing to load an osdmap. TBH though, BTW, I believe that that lz4 bug might only corrupt an osdmap when compression is enabled for the whole OSD, i.e. with bluestore_compression_mode=aggressive, not when compression is enabled just for specific pools.Alas, if you didn't have that enabled, then your bug must be something else.Another important info which I forgot in the previous message: my SAN is actually composed of 6 independent chains, such that a glitch at the SAN level is hardly the culprit, while something along the lines of the bug you pointed me to sounds more reasonable. Also, how to deal with the "failed to load with the OSD map"? Thanks! Fulvio [root@r1srv05.pa1 ~]# grep compress /etc/ceph/cephpa1.conf [root@r1srv05.pa1 ~]# [root@r1srv05.pa1 ~]# ceph --cluster cephpa1 osd pool ls | sort | xargs -i ceph --cluster ceph osd pool get {} compression_mode Error ENOENT: option 'compression_mode' is not set on pool 'cephfs_data'Error ENOENT: option 'compression_mode' is not set on pool 'cephfs_metadata'Error ENOENT: option 'compression_mode' is not set on pool 'cinder-ceph-ec-pa1-cl1' Error ENOENT: option 'compression_mode' is not set on pool 'cinder-ceph-ec-pa1-cl1-cache' Error ENOENT: option 'compression_mode' is not set on pool 'cinder-ceph-pa1-cl1' Error ENOENT: option 'compression_mode' is not set on pool 'cinder-ceph-pa1-devel' Error ENOENT: option 'compression_mode' is not set on pool 'cinder-ceph-rr-pa1-cl1' Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.buckets.data' Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.buckets.index' Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.control' Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.data.root'Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.gc' Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.log'Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.users.keys' Error ENOENT: option 'compression_mode' is not set on pool 'default.rgw.users.uid' Error ENOENT: option 'compression_mode' is not set on pool 'ec-pool-fg'Error ENOENT: option 'compression_mode' is not set on pool 'glance-ct1-cl1' Error ENOENT: option 'compression_mode' is not set on pool 'glance-pa1-cl1'Error ENOENT: option 'compression_mode' is not set on pool 'glance-pa1-devel'Error ENOENT: option 'compression_mode' is not set on pool 'gnocchi-pa1-cl1' Error ENOENT: option 'compression_mode' is not set on pool 'iscsivolumes'Error ENOENT: option 'compression_mode' is not set on pool 'k8s' Error ENOENT: option 'compression_mode' is not set on pool 'rbd' Error ENOENT: option 'compression_mode' is not set on pool '.rgw.root' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.intent-log' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.log' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.buckets' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.buckets.extra' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.buckets.index' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.control'Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.gc'Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.rgw.root'Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.usage' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.users'Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.users.email' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.users.swift' Error ENOENT: option 'compression_mode' is not set on pool 'testrgw.users.uid' Il 5/22/2020 8:50 AM, Dan van der Ster ha scritto:Hi Fulvio, The symptom of several osd's all asserting at the same time in OSDMap::get_map really sounds like this bug: https://tracker.ceph.com/issues/39525 lz4 compression is buggy on CentOS 7 and Ubuntu 18.04 -- you need to disable compression or use a different algorithm. Mimic and nautilus will get a workaround, but it's not planned to be backported to luminous. -- Dan On Thu, May 21, 2020 at 11:18 PM Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> wrote:Hallo all, hope you can help me with very strange problems which arose suddenly today. Tried to search, also in this mailing list, but could not find anything relevant. At some point today, without any action from my side, I noticed some OSDs in my production cluster would go down and never come up.I am on Luminous 12.2.13, CentOS7, kernel 3.10: my setup is non-standard as OSD disks are served off a SAN (which is for sure OK now, although Icannot exclude some glitch). Tried to reboot OSD servers a few times, ran "activate --all", added bluestore_ignore_data_csum=true in the [osd] section in ceph.conf...the number of "down" OSDs changed for a while but now seems rather stable.There are actually two classes of problems (bit more details right below):- ERROR: osd init failed: (5) Input/output error - failed to load OSD map for epoch 141282, got 0 bytes *First problem* This affects 50 OSDs (all disks of this kind, on all but one server): these OSDs are reserved for object storage but I am not yet using them so I may in principle recreate them. But would be interested inunderstanding what the problem is, and learn how to solve it for futurereference. Here is what I see in logs: .....2020-05-21 21:17:48.661348 7fa2e9a95ec0 1 bluefs add_block_device bdev1 path /var/lib/ceph/osd/cephpa1-72/block size 14.5TiB 2020-05-21 21:17:48.661428 7fa2e9a95ec0 1 bluefs mount 2020-05-21 21:17:48.662040 7fa2e9a95ec0 1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0xe83a34000002020-05-21 21:52:43.858464 7fa2e9a95ec0 -1 bluefs mount failed to replaylog: (5) Input/output error 2020-05-21 21:52:43.858589 7fa2e9a95ec0 1 fbmap_alloc 0x55c6bba92e00 shutdown 2020-05-21 21:52:43.858728 7fa2e9a95ec0 -1 bluestore(/var/lib/ceph/osd/cephpa1-72) _open_db failed bluefs mount: (5) Input/output error 2020-05-21 21:52:43.858790 7fa2e9a95ec0 1 bdev(0x55c6bbdb6600 /var/lib/ceph/osd/cephpa1-72/block) close 2020-05-21 21:52:44.103536 7fa2e9a95ec0 1 bdev(0x55c6bbdb8600 /var/lib/ceph/osd/cephpa1-72/block) close2020-05-21 21:52:44.352899 7fa2e9a95ec0 -1 osd.72 0 OSD:init: unable tomount object store2020-05-21 21:52:44.352956 7fa2e9a95ec0 -1 ESC[0;31m ** ERROR: osd initfailed: (5) Input/output errorESC[0m *Second problem* This affects 11 OSDs, which I use *in production* for Cinder block storage: looks like all PGs for this pool are currently OK. Here is the excerpt from the logs. ..... -5> 2020-05-21 20:52:06.756469 7fd2ccc19ec0 0 _get_class not permitted to load kvs -4> 2020-05-21 20:52:06.759686 7fd2ccc19ec0 1 <cls>/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/rgw/cls_rgw.cc:3869:Loaded rgw class! -3> 2020-05-21 20:52:06.760021 7fd2ccc19ec0 1 <cls>/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/log/cls_log.cc:299:Loaded log class! -2> 2020-05-21 20:52:06.760730 7fd2ccc19ec0 1 <cls>/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/cls/replica_log/cls_replica_log.cc:135:Loaded replica log class!-1> 2020-05-21 20:52:06.760873 7fd2ccc19ec0 -1 osd.63 0 failed toload OSD map for epoch 141282, got 0 bytes 0> 2020-05-21 20:52:06.763277 7fd2ccc19ec0 -1/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/osd/OSD.h:In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fd2ccc19ec0time 2020-05-21 20:52:06.760916/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/osd/OSD.h:994: FAILED assert(ret)Has anyone any idea how I could fix these problems, or what I coulddo to try and shed some light? And also, what caused them, and whetherthere is some magic configuration flag I could use to protect my cluster?Thanks a lot for your help! Fulvio _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx-- Fulvio Galeazzi GARR-CSD Department skype: fgaleazzi70 tel.: +39-334-6533-250
-- Fulvio Galeazzi GARR-CSD Department skype: fgaleazzi70 tel.: +39-334-6533-250
Attachment:
smime.p7s
Description: Firma crittografica S/MIME
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx