Thanks Lincoln,
indeed, as I said the cluster is recovering, so there are pending ops:
pgs: 21.034% pgs not active
1692310/24980804 objects degraded (6.774%)
5612149/24980804 objects misplaced (22.466%)
458 active+clean
329 active+remapped+backfill_wait
159 activating+remapped
100 active+undersized+degraded+remapped+backfill_wait
58 activating+undersized+degraded+remapped
27 activating
22 active+undersized+degraded+remapped+backfilling
6 active+remapped+backfilling
1 active+recovery_wait+degraded
If it's just a matter to wait for the system to complete the recovery
it's fine, I'll deal with that, but I was wondendering if there is a
more suble problem here.
OK, I'll wait for the recovery to complete and see what happens, thanks.
Cheers,
Alessandro
Il 08/01/18 17:36, Lincoln Bryant ha scritto:
Hi Alessandro,
What is the state of your PGs? Inactive PGs have blocked CephFS
recovery on our cluster before. I'd try to clear any blocked ops and
see if the MDSes recover.
--Lincoln
On Mon, 2018-01-08 at 17:21 +0100, Alessandro De Salvo wrote:
Hi,
I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded.
I have 2 active mds instances and 1 standby. All the active
instances
are now in replay state and show the same error in the logs:
---- mds1 ----
2018-01-08 16:04:15.765637 7fc2e92451c0 0 ceph version 12.2.2
(cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),
process
(unknown), pid 164
starting mds.mds1 at -
2018-01-08 16:04:15.785849 7fc2e92451c0 0 pidfile_write: ignore
empty
--pid-file
2018-01-08 16:04:20.168178 7fc2e1ee1700 1 mds.mds1 handle_mds_map
standby
2018-01-08 16:04:20.278424 7fc2e1ee1700 1 mds.1.20635 handle_mds_map
i
am now mds.1.20635
2018-01-08 16:04:20.278432 7fc2e1ee1700 1 mds.1.20635
handle_mds_map
state change up:boot --> up:replay
2018-01-08 16:04:20.278443 7fc2e1ee1700 1 mds.1.20635 replay_start
2018-01-08 16:04:20.278449 7fc2e1ee1700 1 mds.1.20635 recovery set
is 0
2018-01-08 16:04:20.278458 7fc2e1ee1700 1 mds.1.20635 waiting for
osdmap 21467 (which blacklists prior instance)
---- mds2 ----
2018-01-08 16:04:16.870459 7fd8456201c0 0 ceph version 12.2.2
(cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),
process
(unknown), pid 295
starting mds.mds2 at -
2018-01-08 16:04:16.881616 7fd8456201c0 0 pidfile_write: ignore
empty
--pid-file
2018-01-08 16:04:21.274543 7fd83e2bc700 1 mds.mds2 handle_mds_map
standby
2018-01-08 16:04:21.314438 7fd83e2bc700 1 mds.0.20637 handle_mds_map
i
am now mds.0.20637
2018-01-08 16:04:21.314459 7fd83e2bc700 1 mds.0.20637
handle_mds_map
state change up:boot --> up:replay
2018-01-08 16:04:21.314479 7fd83e2bc700 1 mds.0.20637 replay_start
2018-01-08 16:04:21.314492 7fd83e2bc700 1 mds.0.20637 recovery set
is 1
2018-01-08 16:04:21.314517 7fd83e2bc700 1 mds.0.20637 waiting for
osdmap 21467 (which blacklists prior instance)
2018-01-08 16:04:21.393307 7fd837aaf700 0 mds.0.cache creating
system
inode with ino:0x100
2018-01-08 16:04:21.397246 7fd837aaf700 0 mds.0.cache creating
system
inode with ino:0x1
The cluster is recovering as we are changing some of the osds, and
there
are a few slow/stuck requests, but I'm not sure if this is the cause,
as
there is apparently no data loss (until now).
How can I force the MDSes to quit the replay state?
Thanks for any help,
Alessandro
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com