Re: PG inconsistent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Le 12 Avr 24, à 15:17, Albert Shih Albert.Shih@xxxxxxxx a écrit :

> Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit
>> 
> Hi,
> 
>> 
>> Have you check the hardware status of the involved drives other than with
>> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for DELL
>> hardware for example).
> 
> Yes, all my disk are «under» periodic check with smartctl + icinga.

Actually, I meant lower level tools (drive / server vendor tools).

> 
>> If these tools don't report any media error (that is bad blocs on disks) then
>> you might just be facing the bit rot phenomenon. But this is very rare and
>> should happen in a sysadmin's lifetime as often as a Royal Flush hand in a
>> professional poker player's lifetime. ;-)
>> 
>> If no media error is reported, then you might want to check and update the
>> firmware of all drives.
> 
> You're perfectly right.
> 
> It's just a newbie error, I check on the «main» osd of the PG (meaning the
> first in the list) but forget to check on other.
> 

Ok.

> On when server I indeed get some error on a disk.
> 
> But strangely smartctl report nothing. I will add a check with dmesg.

That's why I pointed you to the drive / server vendor tools earlier as sometimes smartctl is missing the information you want.

> 
>> 
>> Once you figured it out, you may enable osd_scrub_auto_repair=true to have these
>> inconsistencies repaired automatically on deep-scrubbing, but make sure you're
>> using the alert module [1] so to at least get informed about the scrub errors.
> 
> Thanks. I will look into because we got already icinga2 on site so I use
> icinga2 to check the cluster.
> 
> Is they are a list of what the alert module going to check ?

Basically the module checks for ceph status (ceph -s) changes.

https://github.com/ceph/ceph/blob/main/src/pybind/mgr/alerts/module.py

Regards,
Frédéric.

> 
> 
> Regards
> 
> JAS
> --
> Albert SHIH 🦫 🐸
> France
> Heure locale/Local time:
> ven. 12 avril 2024 15:13:13 CEST
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux