Re: Bluestore issue using 18.2.2

Eugen Block <eblock@xxxxxx> · Wed, 14 Aug 2024 11:06:00 +0000

Hi Frank,

you may be right about the checksums, but I just wanted to point out  
the risks of having size 2 pools in general. Since there was no  
response to the thread yet, I wanted to bump it a bit.

Zitat von Frank Schilder <frans@xxxxxx>:

Hi Eugen,

isn't every shard/replica on every OSD read and written with a  
checksum? Even if only the primary holds a checksum, it should be  
possible to identify the damaged shard/replica during deep-scrub  
(even for replication 1).

Apart from that, it is unusual to see a virtual disk have  
read-errors. If its some kind of pass-through mapping, there is  
probably something incorrectly configured with a write cache. Still,  
this would only be a problem if the VM dies unexpectedly. There is  
something off with the setup (unless the underlying hardware device  
for the VDs does actually have damage).

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, August 14, 2024 9:05 AM
To: ceph-users@xxxxxxx
Subject:  Re: Bluestore issue using 18.2.2

Hi,

it looks like you're using size 2 pool(s), I strongly advise to
increase that to 3 (and min_size = 2). Although it's unclear why the
PGs get damaged, the repair of a PG with only two replicas is
difficult, which is the correct one? So to avoid that, avoid pools
with size 2, except for tests and if you don't care about the data.
If you want to use the current situation to learn, you could try to
inspect the PGs with the ceph-objectstore-tool and find out which
replica is the correct one, export it and then inject it into the OSD.
But this can be tricky, of course.

Regards,
Eugen

Zitat von Marianne Spiller <marianne@xxxxxxxxxx>:

Hi,

I am trying to gather experience on a Ceph STAGE cluster; it
consists of virtual machines - which is not perfect, I know. The VMs
are running Debian 12 and podman-4.3.1. There is practically no load
on this Ceph - there is just one client using the storage, and it
makes no noise. So this is what happened:

* "During data consistency checks (scrub), at least one PG has been
flagged as being damaged or inconsistent."
* so I listed them (["2.3","2.58"])
* and tried to repair ("ceph pg repair 2.3", "ceph pg repair 2.58")
* they both went well (resulting in "pgs: 129 active+clean"), but
the cluster keeped its "HEALTH_WARN" state ("Too many repaired reads
on 1 OSDs")
* so I googled for this message; and the only thing I found was to
restart the OSD to get rid of this message and - more important -
the cluster WARN state ("ceph orch daemon restart osd.3")
* after the restart, my cluster was still in WARN state - and
complained about "2 PGs has been flagged as being damaged or
inconsistent" - but other PGs on other OSDs
* I "ceph pg repair"ed them, too, and the cluster's state was WARN
afterwards, again ("Too many repaired reads on 1 OSDs")
* when I restarted the OSD ("ceph orch daemon restart osd.2"), the
crash occured; Ceph marked this OSD "down" and "out" and suspected a
hardware issue, while the OSD HDDs in fact are QEMU "harddisks"
* I can't judge whether it's a serious bug or just due to my
non-optimal STAGE setup, so I'll attach the gzipped log of osd.2

I need help to understand what happened and how to prevent it in the
future. What ist this "Too many repaired reads" and how to deal with
it?

Thanks a lot for reading,
  Marianne

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx