Re: Primary mds failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 27 Jul 2011, Jojy Varghese wrote:
> Hi
>    We are observing that when the primary mds goes away(say OOM killer
> victim), the client keeps on trying (forever) to write to it(try_write
> method in the messenger) and eventually results in filesystem hang. So
> the question is :
> 
>  - Why does the kernel client attempt another mds?

As soon as another mds takes over for it the client will connect to them.  
(Unless there's a bug in the old ceph_connection cleanup.)

>  - Is replication (mds) guaranteed to take place before the primary
> mds goes down? In other words, is replication done preemtively or due
> to a trigger (scheduled or event based)?

The MDS journals updates to the object store (where the objects are 
replicated by multiple osds).  The MDS is careful to inform the client 
which operations have committed and to prevent leakage of uncommitted 
information from one client to another.  On reconnect, clients replay 
their uncommitted state (by resending requests and re-writing back dirty 
cap/inode metadata).

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux