Once you have fixed the issue your need to mark / archive the crash entry's as seen here: https://docs.ceph.com/docs/master/mgr/crash/
---- On Fri, 10 Jan 2020 17:37:47 +0800 Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote ----
Hi,
last week I upgraded our ceph to 14.2.5 (from 14.2.4) and either during
the procedure or shortly after that, some osds crashed. I re-initialised
them and that should be enough to fix everything, I thought.
I looked a bit further and I do see a lot of lines like this (which are
worrying I suppose):
ceph.log:2020-01-10 10:06:41.049879 mon.cephmon3 (mon.0) 234423 :
cluster [DBG] osd.97 reported immediately failed by osd.67
osd.109
osd.133
osd.139
osd.111
osd.38
osd.65
osd.38
osd.65
osd.97
Now everything seems to be OK, but the WARN status remains. Is this a
"feature" of 14.2.5 or am I missing something?
Below the output of `ceph -s`
Cheers
/Simon
10:13 [root@cephmon1 ~]# ceph -s
cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
3 daemons have recently crashed
services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 27h)
mgr: cephmon3(active, since 27h), standbys: cephmon1, cephmon2
mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby
osd: 168 osds: 168 up (since 6m), 168 in (since 3d); 11 remapped pgs
data:
pools: 10 pools, 5216 pgs
objects: 167.61M objects, 134 TiB
usage: 245 TiB used, 1.5 PiB / 1.8 PiB avail
pgs: 1018213/1354096231 objects misplaced (0.075%)
5203 active+clean
10 active+remapped+backfill_wait
2 active+clean+scrubbing+deep
1 active+remapped+backfilling
io:
client: 149 MiB/s wr, 0 op/s rd, 55 op/s wr
recovery: 0 B/s, 30 objects/s
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com