----- Le 12 Avr 24, à 15:17, Albert Shih Albert.Shih@xxxxxxxx a écrit : > Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit >> > Hi, > >> >> Have you check the hardware status of the involved drives other than with >> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for DELL >> hardware for example). > > Yes, all my disk are «under» periodic check with smartctl + icinga. Actually, I meant lower level tools (drive / server vendor tools). > >> If these tools don't report any media error (that is bad blocs on disks) then >> you might just be facing the bit rot phenomenon. But this is very rare and >> should happen in a sysadmin's lifetime as often as a Royal Flush hand in a >> professional poker player's lifetime. ;-) >> >> If no media error is reported, then you might want to check and update the >> firmware of all drives. > > You're perfectly right. > > It's just a newbie error, I check on the «main» osd of the PG (meaning the > first in the list) but forget to check on other. > Ok. > On when server I indeed get some error on a disk. > > But strangely smartctl report nothing. I will add a check with dmesg. That's why I pointed you to the drive / server vendor tools earlier as sometimes smartctl is missing the information you want. > >> >> Once you figured it out, you may enable osd_scrub_auto_repair=true to have these >> inconsistencies repaired automatically on deep-scrubbing, but make sure you're >> using the alert module [1] so to at least get informed about the scrub errors. > > Thanks. I will look into because we got already icinga2 on site so I use > icinga2 to check the cluster. > > Is they are a list of what the alert module going to check ? Basically the module checks for ceph status (ceph -s) changes. https://github.com/ceph/ceph/blob/main/src/pybind/mgr/alerts/module.py Regards, Frédéric. > > > Regards > > JAS > -- > Albert SHIH 🦫 🐸 > France > Heure locale/Local time: > ven. 12 avril 2024 15:13:13 CEST _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx