Bluestore issue using 18.2.2

Marianne Spiller <marianne@xxxxxxxxxx> · Mon, 05 Aug 2024 11:36:44 +0200

Hi,
I am trying to gather experience on a Ceph STAGE cluster; it consists of virtual machines - which is not perfect, I know. The VMs are running Debian 12 and podman-4.3.1. There is practically no load on this Ceph - there is just one client using the storage, and it makes no noise. So this is what happened:

"During data consistency checks (scrub), at least one PG has been flagged as being damaged or inconsistent."
so I listed them (["2.3","2.58"])
and tried to repair ("ceph pg repair 2.3", "ceph pg repair 2.58")
they both went well (resulting in "pgs: 129 active+clean"), but the cluster keeped its "HEALTH_WARN" state ("Too many repaired reads on 1 OSDs")
so I googled for this message; and the only thing I found was to restart the OSD to get rid of this message and - more important - the cluster WARN state ("ceph orch daemon restart osd.3")
after the restart, my cluster was still in WARN state - and complained about "2 PGs has been flagged as being damaged or inconsistent" - but other PGs on other OSDs
I "ceph pg repair"ed them, too, and the cluster's state was WARN afterwards, again ("Too many repaired reads on 1 OSDs")
when I restarted the OSD ("ceph orch daemon restart osd.2"), the crash occured; Ceph marked this OSD "down" and "out" and suspected a hardware issue, while the OSD HDDs in fact are QEMU "harddisks"
I can't judge whether it's a serious bug or just due to my non-optimal STAGE setup, so I'll attach the gzipped log of osd.2

I need help to understand what happened and how to prevent it in the future. What ist this "Too many repaired reads" and how to deal with it?
Thanks a lot for reading,

Marianne

Attachment:
ceph-osd.2.log.gz

Description: GNU Zip compressed data
Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx