Re: [ceph-users] Mimic cluster is offline and not healing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting by morphin (morphinwithyou@xxxxxxxxx):
> After 72 hours I believe we may hit a bug. Any help would be greatly
> appreciated.

Is it feasible for you to stop all client IO to the Ceph cluster? At
least until it stabilizes again. "ceph osd pause" would do the trick
(ceph osd unpause would unset it). 

What kind of workload are you running on the cluster? How does your
crush map looks like (ceph osd getcrushmap -o  /tmp/crush_raw; 
crushtool -d /tmp/crush_raw -o /tmp/crush_edit)?

I have seen a (test) Ceph cluster "healing" itself to the point there was
nothing left to recover on. In *that* case the disks were overbooked
(multiple OSDs per physical disk) ... The flags you set (nooout, nodown,
nobackfill, norecover, noscrub, etc., etc.) helped to get it to recover
again. I would try to get all OSDs online again (and manually keep them
up / restart them, because you have set nodown).

Does the cluster recover at all?

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux