question about OSD failure detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, all,

 

I am new to study Ceph. Trying to understand how it works and designs.

One basic question for me is about how Ceph OSD perform failure detection. I did some searching but cannot get satisfied answer, so try to ask here and hope someone can kindly help me.

The document said, OSD will send heartbeat to other OSD and report failure to Monitor when it detect some other OSD is down.

My questions are

1.  One particular OSD will heartbeat to how many other OSD? Is it possible for one OSD to do heartbeat to most of other OSDs or even all other OSDs in the cluster? In other words, how OSD decide the list of OSDs that it needs to check for health?

2.  If OSD detect a failure, to which Monitor it report to? This selection is random or has some rule? Or it needs to report to multiple Monitors?

3.  Why not each OSD directly do heartbeat with a monitor?

If I get question 1 answered that one OSD maybe need to checking all other OSDs in the cluster, and if this is true for many OSDs in the cluster, it looks like a lot of network traffic and redundant. Say, OSD-1 check OSD-2, OSD-3, OSD-4, while OSD-2 also check OSD-3, OSD-4, OSD-5. Then both OSD1,2 do redundant health checking for OSD-4,5.

If answer of question 1 is: one OSD only need to do heartbeat with very few other OSDs, never has possibility to check most of other OSDs, then I am fine, this will decentralize the monitor for health checking. But I want to know the rule of how OSD decide which other OSD it needs to check to further understand this.

 

I read almost all articles on internet I can find up to now, but still cannot get very satisfied answer. I don’t want to dive into source code yet, that may take a long time for me. I want to first understand the principles. Then decide if I really worth to spend huge time to read src code. So really want someone can help me here.

Any help will be very appreciated!!

 

Thanks,

Ming

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux