Re: Bluestore issue using 18.2.2

Frank Schilder <frans@xxxxxx> · Wed, 14 Aug 2024 07:37:28 +0000

Hi Eugen,

isn't every shard/replica on every OSD read and written with a checksum? Even if only the primary holds a checksum, it should be possible to identify the damaged shard/replica during deep-scrub (even for replication 1).

Apart from that, it is unusual to see a virtual disk have read-errors. If its some kind of pass-through mapping, there is probably something incorrectly configured with a write cache. Still, this would only be a problem if the VM dies unexpectedly. There is something off with the setup (unless the underlying hardware device for the VDs does actually have damage).

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, August 14, 2024 9:05 AM
To: ceph-users@xxxxxxx
Subject:  Re: Bluestore issue using 18.2.2

Hi,

it looks like you're using size 2 pool(s), I strongly advise to
increase that to 3 (and min_size = 2). Although it's unclear why the
PGs get damaged, the repair of a PG with only two replicas is
difficult, which is the correct one? So to avoid that, avoid pools
with size 2, except for tests and if you don't care about the data.
If you want to use the current situation to learn, you could try to
inspect the PGs with the ceph-objectstore-tool and find out which
replica is the correct one, export it and then inject it into the OSD.
But this can be tricky, of course.

Regards,
Eugen

Zitat von Marianne Spiller <marianne@xxxxxxxxxx>:

> Hi,
>
> I am trying to gather experience on a Ceph STAGE cluster; it
> consists of virtual machines - which is not perfect, I know. The VMs
> are running Debian 12 and podman-4.3.1. There is practically no load
> on this Ceph - there is just one client using the storage, and it
> makes no noise. So this is what happened:
>
> * "During data consistency checks (scrub), at least one PG has been
> flagged as being damaged or inconsistent."
> * so I listed them (["2.3","2.58"])
> * and tried to repair ("ceph pg repair 2.3", "ceph pg repair 2.58")
> * they both went well (resulting in "pgs: 129 active+clean"), but
> the cluster keeped its "HEALTH_WARN" state ("Too many repaired reads
> on 1 OSDs")
> * so I googled for this message; and the only thing I found was to
> restart the OSD to get rid of this message and - more important -
> the cluster WARN state ("ceph orch daemon restart osd.3")
> * after the restart, my cluster was still in WARN state - and
> complained about "2 PGs has been flagged as being damaged or
> inconsistent" - but other PGs on other OSDs
> * I "ceph pg repair"ed them, too, and the cluster's state was WARN
> afterwards, again ("Too many repaired reads on 1 OSDs")
> * when I restarted the OSD ("ceph orch daemon restart osd.2"), the
> crash occured; Ceph marked this OSD "down" and "out" and suspected a
> hardware issue, while the OSD HDDs in fact are QEMU "harddisks"
> * I can't judge whether it's a serious bug or just due to my
> non-optimal STAGE setup, so I'll attach the gzipped log of osd.2
>
> I need help to understand what happened and how to prevent it in the
> future. What ist this "Too many repaired reads" and how to deal with
> it?
>
> Thanks a lot for reading,
>   Marianne

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx