On Thu, Jan 26, 2017 at 8:18 AM, Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > HI, > > > we are running two MDS servers in active/standby-replay setup. Recently we > had to disconnect active MDS server, and failover to standby works as > expected. > > > The filesystem currently contains over 5 million files, so reading all the > metadata information from the data pool took too long, since the information > was not available on the OSD page caches. The MDS was timed out by the mons, > and a failover switch to the former active MDS (which was available as > standby again) happened. This MDS in turn had to read the metadata, again > running into a timeout, failover, etc. I resolved the situation by disabling > one of the MDS, which kept the mons from failing the now solely available > MDS. The MDS does not re-read every inode on startup -- rather, it replays its journal (the overall number of files in your system does not factor into this). > So given a large filesystem, how do I prevent failover flapping between MDS > instances that are in the rejoin state and reading the inode information? The monitor's decision to fail an unresponsive MDS is based on the MDS not sending a beacon to the mon -- there is no limit on how long an MDS is allowed to stay in a given state (such as rejoin). So there are two things to investigate here: * Why is the MDS taking so long to start? * Why is the MDS failing to send beacons to the monitor while it is in whatever process that is taking it so long? The answer to both is likely to be found in an MDS log with the debug level turned up, gathered as it starts up. John > > Regards, > Burkhard > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com