Re: Another cluster completely hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This time at the end of recovery procedure you described it was like most pgs active+clean 20 pgs incomplete.
After that when trying to use the cluster I got "request blocked more than" and no vm can start.
I know that something has happened after the broken disk, probably a server reboot. I am investigating.
But even if I find the origin of the problem it will not help in finding a solution now.
So I am using my time in repairing the pool only to save the production data and I will throw away the rest.
Now after marking all pgs as complete with ceph_objectstore_tool I see that:

1) ceph has put out three hdds ( I suppose due to scrub but it is my only my idea, I will check logs) BAD
2) it is recovering for objects degraded and misplaced GOOD
3) vm are not usable yet BAD
4) I see some pgs in state down+peering (I hope is not BAD)

Regarding 1) how I can put again that three hdds in the cluster? Should I remove them from crush and start again?
Can I tell ceph that they are not bad?
Mario

Il giorno mer 29 giu 2016 alle ore 15:34 Lionel Bouton <lionel+ceph@xxxxxxxxxxx> ha scritto:
Hi,

Le 29/06/2016 12:00, Mario Giammarco a écrit :
> Now the problem is that ceph has put out two disks because scrub  has
> failed (I think it is not a disk fault but due to mark-complete)

There is something odd going on. I've only seen deep-scrub failing (ie
detect one inconsistency and marking the pg so) so I'm not sure what
happens in the case of a "simple" scrub failure but what should not
happen is the whole OSD going down on scrub of deepscrub fairure which
you seem to imply did happen.
Do you have logs for these two failures giving a hint at what happened
(probably /var/log/ceph/ceph-osd.<n>.log) ? Any kernel log pointing to
hardware failure(s) around the time these events happened ?

Another point : you said that you had one disk "broken". Usually ceph
handles this case in the following manner :
- the OSD detects the problem and commit suicide (unless it's configured
to ignore IO errors which is not the default),
- your cluster is then in degraded state with one OSD down/in,
- after a timeout (several minutes), Ceph decides that the OSD won't
come up again soon and marks the OSD "out" (so one OSD down/out),
- as the OSD is out, crush adapts pg positions based on the remaining
available OSDs and bring back all degraded pg to clean state by creating
missing replicas while moving pgs around. You see a lot of IO, many pg
in wait_backfill/backfilling states at this point,
- when all is done the cluster is back to HEALTH_OK

When your disk was broken and you waited 24 hours how far along this
process was your cluster ?

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux