I mistyped the user list mail address. I am correcting and sending again. Apologies for the noise.
My mail is below.
İleti başlangıcı:
Hi all,We have recently upgraded from luminous to mimic. It’s been 6 days since this cluster is offline. The long short story is here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030078.htmlI’ve also CC’ed developers since I believe this is a bug. If this is not to correct way I apology and please let me know.For the 6 days lots of thing happened and there were some outcomes about the problem. Some of them was misjudged and some of them are not looked deeper. However the most certain diagnosis is this: each OSD causes very high disk I/O to its bluestore disk (WAL and DB are fine). After that OSDs become unresponsive or very very less responsive. For example "ceph tell osd.x version” stucks like for ever.So due to unresponsive OSDs cluster does not settle. This is our problem! This is the one we are very sure of. But we are not sure of the reason. Here is the latest ceph status: https://paste.ubuntu.com/p/2DyZ5YqPjh/. This is the status after we started all of the OSDs 24 hours ago.Some of the OSDs are not started. However it didnt make any difference when all of them was online.Here is the debug=20 log of an OSD which is same for all others: https://paste.ubuntu.com/p/8n2kTvwnG6/As we figure out there is a loop pattern. I am sure it wont caught from eye.This the full log the same OSD.https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0Here is the strace of the same OSD process:https://paste.ubuntu.com/p/8n2kTvwnG6/Recently we hear more to uprade mimic. I hope none get hurts as we do. I am sure we have done lots of mistakes to let this happening. And this situation may be a example for other user and could be a potential bug for ceph developer.Any help to figure out what is going on would be great.Best Regards,Goktug Yildirim
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com