On 10/01/2020 10:41, Ashley Merrick wrote: > Once you have fixed the issue your need to mark / archive the crash > entry's as seen here: https://docs.ceph.com/docs/master/mgr/crash/ Hi Ashley, thanks, I didn't know this before... It turned out there were quite a few old crashes (since I never archived them) and of the three most recent ones, two were like this: "assert_msg": "/build/ceph-14.2.5/src/common/ceph_time.h: In function 'ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fbda425a700 time 2020-01-02 17:37:56.885082\n/build/ceph-14.2.5/src/common/ceph_time.h: 485: FAILED ceph_assert(z >= signedspan::zero())\n", And another one was too big to paste here ;-) I did a `ceph crash archive-all` and now ceph is OK again :-) Cheers /Simon > > > ---- On Fri, 10 Jan 2020 17:37:47 +0800 *Simon Oosthoek > <s.oosthoek@xxxxxxxxxxxxx>* wrote ---- > > Hi, > > last week I upgraded our ceph to 14.2.5 (from 14.2.4) and either during > the procedure or shortly after that, some osds crashed. I > re-initialised > them and that should be enough to fix everything, I thought. > > I looked a bit further and I do see a lot of lines like this (which are > worrying I suppose): > > ceph.log:2020-01-10 10:06:41.049879 mon.cephmon3 (mon.0) 234423 : > cluster [DBG] osd.97 reported immediately failed by osd.67 > > osd.109 > osd.133 > osd.139 > osd.111 > osd.38 > osd.65 > osd.38 > osd.65 > osd.97 > > Now everything seems to be OK, but the WARN status remains. Is this a > "feature" of 14.2.5 or am I missing something? > > Below the output of `ceph -s` > > Cheers > > /Simon > > 10:13 [root@cephmon1 ~]# ceph -s > cluster: > id: b489547c-ba50-4745-a914-23eb78e0e5dc > health: HEALTH_WARN > 3 daemons have recently crashed > > services: > mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 27h) > mgr: cephmon3(active, since 27h), standbys: cephmon1, cephmon2 > mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby > osd: 168 osds: 168 up (since 6m), 168 in (since 3d); 11 remapped pgs > > data: > pools: 10 pools, 5216 pgs > objects: 167.61M objects, 134 TiB > usage: 245 TiB used, 1.5 PiB / 1.8 PiB avail > pgs: 1018213/1354096231 objects misplaced (0.075%) > 5203 active+clean > 10 active+remapped+backfill_wait > 2 active+clean+scrubbing+deep > 1 active+remapped+backfilling > > io: > client: 149 MiB/s wr, 0 op/s rd, 55 op/s wr > recovery: 0 B/s, 30 objects/s > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com