2011/3/23 huang jun <hjwsm1989@xxxxxxxxx>: > Hi all, > There are two mds in the ceph cluster,one is active and the other is > standby. In my test, I found the mds0 was marked as laggy, and it > was taken over by the standby soon. And it will take a long time for > the standby to become active if there are a great many of requests > from the client. I want to know under what circumstances mds would be > marked as laggy. The MDS gets marked laggy if it goes too long without sending a "beacon" to the monitors. This generally happens if the MDS gets overloaded by client requests for some reason -- or if it simply crashes. Your config looks okay so either your MDS doesn't have the resources it needs for the workload you're using, or the workload breaks our default config/algorithms. The amount of time it takes for a standby to take over is generally determined by 3 things: 1) Time to declare an mds down (this is when it's marked laggy) 2) Time to replay the MDS journal 3) Time to handle client replay requests Usually (2) and (3) are dominated by (1), and I'm surprised this isn't the case for you... What's your workload look like? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html