Re: If one node lost connect to replication network?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the quick reply.
Ok, so at this time looks like better to avoid split networks across network interfaces.
Where can I find list of all issues related to the concrete version?


On Mon, Mar 11, 2013 at 5:16 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Monday, March 11, 2013, Igor Laskovy wrote:
Hi there!

I have Ceph FS cluster version 0.56.3. This is 3 nodes with XFS on disks and with minimum options in ceph.conf in my lab and I do some crush testing. 
One of the of several tests is lost connect to replication network only.
What expect behavior in this situation? Will mounted disk on client machine frozen or so?

Look like in my case whole cluster have gone crazy. 

Yeah, this is a known issue with the way Ceph determines if nodes are up or down. Basically the OSDs are communicating over the replication network and reporting to the monitors that the disconnected node is dead, but when they mark it down it finds out and insists (over the public network) that it's up.

I believe Sage fixed this issue in our development releases, but could be misremembering. Sage?
-Greg



--
Igor Laskovy
facebook.com/igor.laskovy
Kiev, Ukraine
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux