Re: [ceph-users] Mimic cluster is offline and not healing

by morphin <morphinwithyou@xxxxxxxxx> · Thu, 27 Sep 2018 23:52:16 +0300

Good news... :)

After I tried everything. I decide to re-create my MONs from OSD's and
I used the script:
https://paste.ubuntu.com/p/rNMPdMPhT5/

And it worked!!!
I think when 2 server crashed and come back same time some how MON's
confused and the maps just corrupted.
After re-creation all the MONs was have the same map so it worked.
But still I dont know how to hell the mons can cause endless %95 I/O ???
This a bug anyway and if you dont want to leave the problem then do
not "enable" your mons. Just start them manual! Another tough lesson.

ceph -s: https://paste.ubuntu.com/p/m3hFF22jM9/

As you can see below some of the OSDs are still down. And when I start
them they dont start.
Check start log: https://paste.ubuntu.com/p/ZJQG4khdbx/
Debug log: https://paste.ubuntu.com/p/J3JyGShHym/

What we can do for the problem?
What is the cause of the problem?

Thank you everyone. You helped me a lot! :)
>
> I think I might find something.
> When I start an OSD its making High I/O  around %95 and the other OSDs
> are also triggered and altogether they make same the I/O. This is true
> even if when I set noup flag. So all the OSDs are making high I/O when
> ever an OSD starts.
>
> I think this is too much. I have 168 OSD and when I start them OSD I/O
> job never finishes. I let the cluster for 70 hours and the high I/O
> never finished at all.
>
> We're trying to start OSD's host by host and wait for settlement but
> it takes too much time.
> OSD can not even answer "ceph tell osd.158 version". So if it becomes
> so busy and this seems to be a loop since another OSD startup triggers
> other OSD I/O.
>
> So I debug it and I hope this can be examined.
>
> This is debug=20 OSD log :
> Full log:  https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0
> Less log: Only the last part before the high I/O is finished:
> https://paste.ubuntu.com/p/7ZfwH8CBC5/
> Strace -f -P osd;
> - When I start the osd: https://paste.ubuntu.com/p/8n2kTvwnG6/
> - After I/O is finished: https://paste.ubuntu.com/p/4sGfj7Bf4c/
>
> Now some people in IRC says this is a bug, try Ubuntu and new Ceph
> repo maybe it will help. I agree with them and I will give a shot.
> What do you think?
> by morphin <morphinwithyou@xxxxxxxxx>, 27 Eyl 2018 Per, 16:27
> tarihinde şunu yazdı:
> >
> > I should not have client I/O right now. All of my VMs are down right
> > now. There is only a single pool.
> >
> > Here is my crush map: https://paste.ubuntu.com/p/Z9G5hSdqCR/
> >
> > Cluster does not recover. After starting OSDs with the specified
> > flags, OSD up count drops from 168 to 50 with in 24 hours.
> > Stefan Kooman <stefan@xxxxxx>, 27 Eyl 2018 Per, 16:10 tarihinde şunu yazdı:
> > >
> > > Quoting by morphin (morphinwithyou@xxxxxxxxx):
> > > > After 72 hours I believe we may hit a bug. Any help would be greatly
> > > > appreciated.
> > >
> > > Is it feasible for you to stop all client IO to the Ceph cluster? At
> > > least until it stabilizes again. "ceph osd pause" would do the trick
> > > (ceph osd unpause would unset it).
> > >
> > > What kind of workload are you running on the cluster? How does your
> > > crush map looks like (ceph osd getcrushmap -o  /tmp/crush_raw;
> > > crushtool -d /tmp/crush_raw -o /tmp/crush_edit)?
> > >
> > > I have seen a (test) Ceph cluster "healing" itself to the point there was
> > > nothing left to recover on. In *that* case the disks were overbooked
> > > (multiple OSDs per physical disk) ... The flags you set (nooout, nodown,
> > > nobackfill, norecover, noscrub, etc., etc.) helped to get it to recover
> > > again. I would try to get all OSDs online again (and manually keep them
> > > up / restart them, because you have set nodown).
> > >
> > > Does the cluster recover at all?
> > >
> > > Gr. Stefan
> > >
> > > --
> > > | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
> > > | GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx