HI,
we are running two MDS servers in active/standby-replay setup. Recently
we had to disconnect active MDS server, and failover to standby works as
expected.
The filesystem currently contains over 5 million files, so reading all
the metadata information from the data pool took too long, since the
information was not available on the OSD page caches. The MDS was timed
out by the mons, and a failover switch to the former active MDS (which
was available as standby again) happened. This MDS in turn had to read
the metadata, again running into a timeout, failover, etc. I resolved
the situation by disabling one of the MDS, which kept the mons from
failing the now solely available MDS.
So given a large filesystem, how do I prevent failover flapping between
MDS instances that are in the rejoin state and reading the inode
information?
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com