Re: Sometimes PGs inconsistent (although there is no load on them)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This sounds like you have widespread inconsistencies that are surfaced by scrubs, not caused by them.  Frequent causes:

* Using a RAID HBA with bugs (all of them, in my experience), with broken preserved cache replay, with forced writeback cache without BBU, etc.  
* Power was out longer than BBUs could last
* Volatile cache enabled on HDDs
* Client grade SSDs without PLP

> On Mar 11, 2025, at 6:57 AM, Martin Konold <martin.konold@xxxxxxxxxx> wrote:
> 
> 
> Hi,
> 
> I suspect a hw issue. Please check the networks.
> 
> Regards 
> --martin
> 
> Am 11.03.2025 11:24 schrieb Marianne Spiller <marianne@xxxxxxxxxx>:
> Dear list,
> 
> I'm currently maintaining several Ceph (prod) installations. One of them consists of 3 MON hosts and 6 OSD hosts hosting 40 OSDs in total. And there are 5 separate Proxmox-Hosts - they only host the VMs and use the storage provided by Ceph, but they are not part of Ceph.
> 
> The worst case happened: due to an outage, all these hosts crashed hardly the same time.
> 
> Last week, I began to restart (only the Ceph hosts; Proxmox servers are still down). Ceph was very unhappy with the situation as a whole - one OSD host (and its 6 OSDs) is completely gone, some hardware issues (33 OSDs left, networking, PSU, I'm working on it) and 73 out of 129 PGs inconsistent.
> 
> Meanwhile, the overall status of the cluster is "HEALTHY" again.
> But nearly every day, one or two PGs get damaged. Never on the same OSDs. And there is no traffic on the storage as the virtualization hosts are not running. I see no further reason in the logs: everything is fine, scrub starts and leaves one or more PGs damaged. Repairing them is successful, but maybe next night, another PG is stuck.
> 
> Do you have hints to investigate this any further? I would love to understand more before starting the Proxmox cluster again. Using Ceph 18.2.4 (Proxmox packages).
> 
> Thanks a lot,
>   Marianne
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux