Re: Bluestore issue using 18.2.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

it looks like you're using size 2 pool(s), I strongly advise to increase that to 3 (and min_size = 2). Although it's unclear why the PGs get damaged, the repair of a PG with only two replicas is difficult, which is the correct one? So to avoid that, avoid pools with size 2, except for tests and if you don't care about the data. If you want to use the current situation to learn, you could try to inspect the PGs with the ceph-objectstore-tool and find out which replica is the correct one, export it and then inject it into the OSD. But this can be tricky, of course.

Regards,
Eugen

Zitat von Marianne Spiller <marianne@xxxxxxxxxx>:

Hi,

I am trying to gather experience on a Ceph STAGE cluster; it consists of virtual machines - which is not perfect, I know. The VMs are running Debian 12 and podman-4.3.1. There is practically no load on this Ceph - there is just one client using the storage, and it makes no noise. So this is what happened:

* "During data consistency checks (scrub), at least one PG has been flagged as being damaged or inconsistent."
* so I listed them (["2.3","2.58"])
* and tried to repair ("ceph pg repair 2.3", "ceph pg repair 2.58")
* they both went well (resulting in "pgs: 129 active+clean"), but the cluster keeped its "HEALTH_WARN" state ("Too many repaired reads on 1 OSDs") * so I googled for this message; and the only thing I found was to restart the OSD to get rid of this message and - more important - the cluster WARN state ("ceph orch daemon restart osd.3") * after the restart, my cluster was still in WARN state - and complained about "2 PGs has been flagged as being damaged or inconsistent" - but other PGs on other OSDs * I "ceph pg repair"ed them, too, and the cluster's state was WARN afterwards, again ("Too many repaired reads on 1 OSDs") * when I restarted the OSD ("ceph orch daemon restart osd.2"), the crash occured; Ceph marked this OSD "down" and "out" and suspected a hardware issue, while the OSD HDDs in fact are QEMU "harddisks" * I can't judge whether it's a serious bug or just due to my non-optimal STAGE setup, so I'll attach the gzipped log of osd.2

I need help to understand what happened and how to prevent it in the future. What ist this "Too many repaired reads" and how to deal with it?

Thanks a lot for reading,
  Marianne


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux