Re: Removing OSD after fixing PG-inconsistent brings back PG-inconsistent state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,

In the end recovery started, just have to wait for 5 minutes (I´m guessing the mon_osd_down_out_interval). Recovery ended with the pg inconsistent - scrub error. Afterwards we run ceph pg repair again, and got HEALTH_OK again. So we could get the broken OSD down!

However, I find strange that the PG inconsistent kept on coming back when primary OSD down (it had few I/O that we guess cause the pg inconsistent). We had already performed a ceph pg repair, got HEALTH_OK, force pg scrub, and pg scrub-deep, and the PG was active and healthy. However, as soon as we took the OSD down, PG inconsistent - scrub error state got back.

Just for the sake of curiosity, if anyone has debugging ideas...
Cheers,

Ana


----- Original Message -----
From: "Ana Aviles" <ana@xxxxxxxxxxxx>
To: ceph-users@xxxxxxxx
Sent: Friday, July 29, 2016 7:39:11 PM
Subject:  Removing OSD after fixing PG-inconsistent brings back PG-inconsistent state

Hello,

We have a cluster with HEALTH_ERR due to inconsisten PG.

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 2.ae is active+clean+inconsistent, acting [11,4]
1 scrub errors

We have run ceph pg repair on the problematic pg and health went back to OK.

I checked the two osd acting on that pg (we have 2 replicas here) and
one of them had I/O errors, which we assume was the cause of the
inconsistent PG in the first place. So, to avoid further problems, we
want to remove the disk from the cluster. However, as soon as we stop
the OSD, we get back the inconsistent PG and recovery won't start.

Any ideas of what could be happening? Why do we get back to inconsistent
PG? How to remove the failing disk?

Can't find any ERR on the logs of the OSDs, only on monitors logs. So I
can't see if there is a specific object causing the inconsistent state
(doesn't seem to be the case).

I attach the ceph pg query when HEALTH_ERR.

Any help would be much appreciated. Thanks!

-- 
Ana Avilés
Greenhost - sustainable hosting & digital security
E: ana@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux