Primary mds failure

Jojy Varghese <jojy.varghese@xxxxxxxxx> · Wed, 27 Jul 2011 13:26:40 -0700

Hi
   We are observing that when the primary mds goes away(say OOM killer
victim), the client keeps on trying (forever) to write to it(try_write
method in the messenger) and eventually results in filesystem hang. So
the question is :

 - Why does the kernel client attempt another mds?
 - Is replication (mds) guaranteed to take place before the primary
mds goes down? In other words, is replication done preemtively or due
to a trigger (scheduled or event based)?

thanks again
Jojy
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html