Hi there!I have Ceph FS cluster version 0.56.3. This is 3 nodes with XFS on disks and with minimum options in ceph.conf in my lab and I do some crush testing.One of the of several tests is lost connect to replication network only.What expect behavior in this situation? Will mounted disk on client machine frozen or so?Look like in my case whole cluster have gone crazy.
Yeah, this is a known issue with the way Ceph determines if nodes are up or down. Basically the OSDs are communicating over the replication network and reporting to the monitors that the disconnected node is dead, but when they mark it down it finds out and insists (over the public network) that it's up.
I believe Sage fixed this issue in our development releases, but could be misremembering. Sage?
-Greg
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com