The only possible hint, crush coincides with a scrub time interval start. Why it didn't happen yesterday at the same time, I have no idea. I returned default debug settings with a hope that I get a
little bit more info when next crush happens. I really would like to debug only specific components rather than turning everything up to 20. Sorry for hijacking the post, I will create a new one
when I have more information.
On 7/23/2019 9:50 PM, Alex Litvak wrote:
I just had an osd crashed with no logs (debug was not enabled). Happened 24 hours later after actual upgrade from 14.2.1 to 14.2.2. Nothing else changed as far as environment or load. Disk is OK.
Restarted osd and it came back. Had cluster up for 2 month until the upgrade without an issue.
On 7/23/2019 2:56 PM, Nathan Fish wrote:
I have not had any more OSDs crash, but the 3 that crashed still crash
on startup. I may purge and recreate them, but there's no hurry. I
have 18 OSDs per host and plenty of free space currently.
On Tue, Jul 23, 2019 at 2:19 AM Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:
Have they been stable since, or still had some crash?
,Thanks
---- On Sat, 20 Jul 2019 10:09:08 +0800 Nigel Williams <nigel.williams@xxxxxxxxxxx> wrote ----
On Sat, 20 Jul 2019 at 04:28, Nathan Fish <lordcirth@xxxxxxxxx> wrote:
On further investigation, it seems to be this bug:
http://tracker.ceph.com/issues/38724
We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this bug, recovered with:
systemctl reset-failed ceph-osd@160
systemctl start ceph-osd@160
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com