Hi,
could you resend the output with the original formatting? It's hard to
read without whitespaces. Since your osd tree looks a bit unusual, do
you maybe see some oom killers or anything? I didn't have a chance to
look into the logs yet, maybe someone else already has.
Thanks,
Eugen
Zitat von Romain Lebbadi-Breteau <romain.lebbadi-breteau@xxxxxxxxxx>:
Hi,
Yes we're trying to remove the osd.3. Here is the result of `ceph osd df` :
IDCLASSWEIGHTREWEIGHTSIZERAWUSEDATAOMAPMETAAVAIL%USEVARPGSSTATUS
3hdd1.818791.000001.8TiB443GiB441GiB6.8MiB1.5GiB1.4TiB23.782.3716up
6hdd1.818791.000001.8TiB114GiB114GiB981KiB343MiB1.7TiB6.140.618up
12hdd1.818791.000001.8TiB359GiB358GiB5.8MiB1.0GiB1.5TiB19.271.9215up
13hdd1.818791.000001.8TiB331GiB330GiB3.9MiB1.5GiB1.5TiB17.771.7715up
15hdd1.818791.000001.8TiB217GiB216GiB2.0MiB1.1GiB1.6TiB11.641.1613up
16hdd9.095201.000009.1TiB785GiB783GiB8.8MiB1.9GiB8.3TiB8.430.8451up
17hdd1.818791.000001.8TiB204GiB203GiB2.9MiB1.2GiB1.6TiB10.951.0911up
1hdd5.457491.000005.5TiB428GiB427GiB4.9MiB876MiB5.0TiB7.660.7624up
4hdd5.457491.000005.5TiB638GiB636GiB6.8MiB2.2GiB4.8TiB11.421.1436up
8hdd5.457491.000005.5TiB594GiB591GiB8.7MiB2.2GiB4.9TiB10.621.0630up
11hdd5.457491.000005.5TiB567GiB565GiB7.8MiB2.1GiB4.9TiB10.151.0129up
14hdd1.818791.000001.8TiB197GiB195GiB2.9MiB1.2GiB1.6TiB10.551.0510up
0hdd9.095201.000009.1TiB764GiB763GiB9.6MiB1.8GiB8.3TiB8.210.8247up
5hdd9.095201.000009.1TiB791GiB789GiB11MiB2.6GiB8.3TiB8.500.8538up
9hdd9.095201.000009.1TiB858GiB856GiB11MiB2.4GiB8.3TiB9.210.9244up
TOTAL71TiB7.1TiB7.1TiB93MiB24GiB64TiB10.03
MIN/MAXVAR:0.61/2.37STDDEV:4.97
And here id `ceph osd pool ls detail` (but yes our replicated size if 3) :
pool1'.mgr'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num1pgp_num1autoscale_modeonlast_change32flagshashpspoolstripe_width0pg_num_max32pg_num_min1applicationmgr
pool2'volumes'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change9327lfor0/0/104flagshashpspool,selfmanaged_snapsstripe_width0applicationrbd
pool3'images'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change9018lfor0/0/104flagshashpspool,selfmanaged_snapsstripe_width0applicationrbd
pool4'vms'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change9149lfor0/0/106flagshashpspool,selfmanaged_snapsstripe_width0applicationrbd
pool5'polyphoto_backup'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change372lfor0/0/362flagshashpspool,selfmanaged_snapsstripe_width0compression_algorithmsnappycompression_modeaggressiveapplicationrbd
And we're using quincy :
romain:step@alpha-cen~ $ sudoceph--version
cephversion17.2.6(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
All our physical disks are on their own RAID 0 using the built-in
Raid Controller (PERC H730 Mini).
For the logs, journalctl has rotated and we don't have them anymore.
I recreated the situation where the three OSDs crash (shutting down
osd.3 and marking it out), and here are the logs :
ceph -w : https://pastebin.com/A7gJ3ss2
<https://pastebin.com/A7gJ3ss2>osd.0, osd.3 and osd.11 :
https://gitlab.com/RomainL456/ceph-incident-logs/
I put the full logs (output of `sudo journalctl --since "23:00" -u
ceph-9b4b12fe-4dc6-11ed-9ed9-d18a342d7c2b@osd.*`) in a public Git
repo, and I also put a file for the logs right before osd.0 crashed.
Here is the timeline of events (local time) :
23:27 : I manually shut down osd.3
23:46 : osd.0 crashes
23:46 : osd.11 crashes
23:48 : I start osd.3, it crashes in less than a minute
23:49 : After I mark osd.3 "in" and start it again, it comes back
online with osd.0 and osd.11 soon after
Best regards,
Romain Lebbadi-Breteau
On 2024-03-08 3:17 a.m., Eugen Block wrote:
Hi,
can you share more details? Which OSD are you trying to get out,
the primary osd.3?
Can you also share 'ceph osd df'?
It looks like a replicated pool with size 3, can you confirm with
'ceph osd pool ls detail'?
Do you have logs from the crashing OSDs when you take out osd.3?
Which ceph version is this?
Thanks,
Eugen
Zitat von Romain Lebbadi-Breteau <romain.lebbadi-breteau@xxxxxxxxxx>:
Hi,
We're a student club from Montréal where we host an Openstack
cloud with a Ceph backend for storage of virtual machines and
volumes using rbd.
Two weeks ago we received an email from our ceph cluster saying
that some pages were damaged. We ran "sudo ceph pg repair <pg-id>"
but then there was an I/O error on the disk during the recovery
("An unrecoverable disk media error occurred on Disk 4 in
Backplane 1 of Integrated RAID Controller 1." and "Bad block
medium error is detected at block 0x1377e2ad on Virtual Disk 3 on
Integrated RAID Controller 1." messages on iDRAC).
After that, the PG we tried to repair was in the state
"active+recovery_unfound+degraded". After a week, we ran the
command "sudo ceph pg 2.1b mark_unfound_lost revert" to try to
recover the damaged PG. We tried to boot the virtual machine that
had crashed because of this incident, but the volume seemed to
have been completely erased, the "mount" command said there was no
filesystem on it, so we recreated the VM from a backup.
A few days later, the same PG was once again damaged, and since we
knew the physical disk on the OSD hosting one part of the PG had
problems, we tried to "out" the OSD from the cluster. That
resulted in the two other OSDs hosting copies of the problematic
PG to go down, which caused timeouts on our virtual machines, so
we put the OSD back in.
We then tried to repair the PG again, but that failed and the PG
is now "active+clean+inconsistent+failed_repair", and whenever it
goes down, two other OSDs from two other hosts go down too after a
few minutes, so it's impossible to replace the disk right now,
even if we have new ones available.
We have backups for most of our services, but it would be very
disrupting to delete the whole cluster, and we don't know that to
do with the broken PG and the OSD that can't be shut down.
Any help would be really appreciated, we're not experts with Ceph
and Openstack, and it's likely we handled things wrong at some
point, but we really want to go back to a healthy Ceph.
Here are some information about our cluster :
romain:step@alpha-cen ~ $ sudo ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 2.1b is active+clean+inconsistent+failed_repair, acting [3,11,0]
romain:step@alpha-cen ~ $ sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 70.94226 root default
-7 20.00792 host alpha-cen
3 hdd 1.81879 osd.3 up 1.00000 1.00000
6 hdd 1.81879 osd.6 up 1.00000 1.00000
12 hdd 1.81879 osd.12 up 1.00000 1.00000
13 hdd 1.81879 osd.13 up 1.00000 1.00000
15 hdd 1.81879 osd.15 up 1.00000 1.00000
16 hdd 9.09520 osd.16 up 1.00000 1.00000
17 hdd 1.81879 osd.17 up 1.00000 1.00000
-5 23.64874 host beta-cen
1 hdd 5.45749 osd.1 up 1.00000 1.00000
4 hdd 5.45749 osd.4 up 1.00000 1.00000
8 hdd 5.45749 osd.8 up 1.00000 1.00000
11 hdd 5.45749 osd.11 up 1.00000 1.00000
14 hdd 1.81879 osd.14 up 1.00000 1.00000
-3 27.28560 host gamma-cen
0 hdd 9.09520 osd.0 up 1.00000 1.00000
5 hdd 9.09520 osd.5 up 1.00000 1.00000
9 hdd 9.09520 osd.9 up 1.00000 1.00000
romain:step@alpha-cen ~ $ sudo rados list-inconsistent-obj 2.1b
{"epoch":9787,"inconsistents":[]}
romain:step@alpha-cen ~ $ sudo ceph pg 2.1b query
https://pastebin.com/gsKCPCjr
Best regards,
Romain Lebbadi-Breteau
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx