Hello Cephers,
I have two identical Ceph clusters with 32 OSDs each, running radosgw with EC. They were running Octopus on Ubuntu 20.04.
On one of these clusters, I have upgraded OS to Ubuntu 22.04 and Ceph version is upgraded to Quincy 17.2.6. This cluster completed the process without any issue and it works as expected.
On the second cluster, I followed the same procedure and upgraded the cluster. After upgrade 9 of 32 OSDs can not be activated. AFAIU, the label of these OSDs can not be read. ceph-volume lvm activate {osd.id} {osd_fsid} command fails as below:
stderr: failed to read label for /dev/ceph-block-13/block-13: (5) Input/output error
stderr: 2023-12-19T11:46:25.310+0300 7f088cd7ea80 -1 bluestore(/dev/ceph-block-13/block-13) _read_bdev_label bad crc on label, expected 2340927273 != actual 2067505886
All ceph-bluestore-tool and ceph-object-storetool commands fail with the same message, so I can not try repair, fsck or migrate.
# ceph-bluestore-tool repair --deep yes --path /var/lib/ceph/osd/ceph-13/
failed to load os-type: (2) No such file or directory
2023-12-19T13:57:06.551+0300 7f39b1635a80 -1 bluestore(/var/lib/ceph/osd/ceph-13/block) _read_bdev_label bad crc on label, expected 2340927273 != actual 2067505886
I also tried show label with bluestore-tool without success.
# ceph-bluestore-tool show-label --dev /dev/ceph-block-13/block-13
unable to read label for /dev/ceph-block-13/block-13: (5) Input/output error
2023-12-19T14:01:19.668+0300 7fdcdd111a80 -1 bluestore(/dev/ceph-block-13/block-13) _read_bdev_label bad crc on label, expected 2340927273 != actual 2067505886
I can get the information including osd_fsif, block_uuid of all failed OSDs via ceph-volume lvm list like below.
====== osd.13 ======
[block] /dev/ceph-block-13/block-13
block device /dev/ceph-block-13/block-13
block uuid jFaTba-ln5r-muQd-7Ef9-3tWe-JwvO-qW9nqi
cephx lockbox secret
cluster fsid 4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f
cluster name ceph
crush device class None
encrypted 0
osd fsid c9ee3ef6-73d7-4029-9cd6-086cc95d2f27
osd id 13
osdspec affinity
type block
vdo 0
devices /dev/mapper/mpathb
All vgs and lvs look healthy.
# lvdisplay ceph-block-13/block-13
--- Logical volume ---
LV Path /dev/ceph-block-13/block-13
LV Name block-13
VG Name ceph-block-13
LV UUID jFaTba-ln5r-muQd-7Ef9-3tWe-JwvO-qW9nqi
LV Write Access read/write
LV Creation host, time ank-backup01, 2023-11-29 10:41:53 +0300
LV Status available
# open 0
LV Size <7.28 TiB
Current LE 1907721
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:33
This is a single node cluster running only radosgw. The environment is as follows:
# ceph -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "osd_replicated_rule",
"type": 1,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "default~hdd"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "default.rgw.buckets.data",
"type": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
]
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 226.29962 root default
-3 226.29962 host ank-backup01
0 hdd 7.29999 osd.0 up 1.00000 1.00000
1 hdd 7.29999 osd.1 up 1.00000 1.00000
2 hdd 7.29999 osd.2 up 1.00000 1.00000
3 hdd 7.29999 osd.3 up 1.00000 1.00000
4 hdd 7.29999 osd.4 up 1.00000 1.00000
5 hdd 7.29999 osd.5 up 1.00000 1.00000
6 hdd 7.29999 osd.6 up 1.00000 1.00000
7 hdd 7.29999 osd.7 up 1.00000 1.00000
8 hdd 7.29999 osd.8 up 1.00000 1.00000
9 hdd 7.29999 osd.9 up 1.00000 1.00000
10 hdd 7.29999 osd.10 up 1.00000 1.00000
11 hdd 7.29999 osd.11 up 1.00000 1.00000
12 hdd 7.29999 osd.12 down 0 1.00000
13 hdd 7.29999 osd.13 down 0 1.00000
14 hdd 7.29999 osd.14 down 0 1.00000
15 hdd 7.29999 osd.15 down 0 1.00000
16 hdd 7.29999 osd.16 down 0 1.00000
17 hdd 7.29999 osd.17 down 0 1.00000
18 hdd 7.29999 osd.18 down 0 1.00000
19 hdd 7.29999 osd.19 down 0 1.00000
20 hdd 7.29999 osd.20 down 0 1.00000
21 hdd 7.29999 osd.21 up 1.00000 1.00000
22 hdd 7.29999 osd.22 up 1.00000 1.00000
23 hdd 7.29999 osd.23 up 1.00000 1.00000
24 hdd 7.29999 osd.24 up 1.00000 1.00000
25 hdd 7.29999 osd.25 up 1.00000 1.00000
26 hdd 7.29999 osd.26 up 1.00000 1.00000
27 hdd 7.29999 osd.27 up 1.00000 1.00000
28 hdd 7.29999 osd.28 up 1.00000 1.00000
29 hdd 7.29999 osd.29 up 1.00000 1.00000
30 hdd 7.29999 osd.30 up 1.00000 1.00000
31 hdd 7.29999 osd.31 up 1.00000 1.00000
Does anybody have any idea why the labels of these OSDs can not be read? Any help would be appreciated.
Best Regards,
Huseyin Cotuk
hcotuk@xxxxxxxxx