Re: Primary mds failure

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 27 Jul 2011 13:35:08 -0700 (PDT)

On Wed, 27 Jul 2011, Jojy Varghese wrote:
> Hi
>    We are observing that when the primary mds goes away(say OOM killer
> victim), the client keeps on trying (forever) to write to it(try_write
> method in the messenger) and eventually results in filesystem hang. So
> the question is :
> 
>  - Why does the kernel client attempt another mds?

As soon as another mds takes over for it the client will connect to them.  
(Unless there's a bug in the old ceph_connection cleanup.)

>  - Is replication (mds) guaranteed to take place before the primary
> mds goes down? In other words, is replication done preemtively or due
> to a trigger (scheduled or event based)?

The MDS journals updates to the object store (where the objects are 
replicated by multiple osds).  The MDS is careful to inform the client 
which operations have committed and to prevent leakage of uncommitted 
information from one client to another.  On reconnect, clients replay 
their uncommitted state (by resending requests and re-writing back dirty 
cap/inode metadata).

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html