Re: question about OSD failure detection

"Liu, Ming (HPIT-GADSC)" <ming.liu2@xxxxxx> · Mon, 13 Apr 2015 14:16:12 +0000

Thank you Xiaoxi very much!

Your answer is very clear. I need to read more first about CRUSH. But I think basically your answer help me a lot. In practice, OSD check OSD should be OK. There is possibility of a ‘cross product’ connections
 among OSD for heartbeat, but by designing rules, that can be avoided. So I think I need first to fully understand CRUSH, how PG is mapped to OSD.
My major concern is, if each OSD needs to check all others, there will be too much heartbeat as cluster size grow. 10 OSD needs 10*10 heartbeat messages every checking interval, which is OK, but 1000 OSDs needs
 1000*1000 heartbeats which seems too many messages. But I think you just confirmed this will not happen.

And I notice all other thread title has a prefix such as [ceph-user]. If this is a rule for this maillist, I am sorry, will obey next time.

Thanks,
Ming

From: Chen, Xiaoxi [mailto:xiaoxi.chen@xxxxxxxxx] 

Sent: Monday, April 13, 2015 2:32 PM

To: Liu, Ming (HPIT-GADSC); ceph-users@xxxxxxxxxxxxxx

Subject: RE: question about OSD failure detection

Hi,

1.      
In short, 
the OSD need to heartbeat with up to  #PG x (#Replica -1 ), but actually will be much less since most of the peers are redundant.
For example,  An OSD (say OSD 1) is holding  100  PGs, especially for some PGs, say PG 1, OSD1 is the primary OSD of PG1, then OSD1 need to peer with all other OSDs in PG1’s acting set
 and up set(basically you could think these two sets are other replications for PG1).

So if the cluster with very simple(default) ruleset, it’s possible that an OSD need to peer with all other OSDs.

2.      
OSD will randomly select a mon when the OSD boot up , and talking to the mon consistently. It’s the monitor quorum’s job to reach an agreement about the OSD status. See paxos if you want to know more
 details in how to reach the agreement.

3.      
OSD do ping with Mon, but in reality, the network between monitor and osd likely not the networking between OSD’s . As an instance, Mon <-> OSD is in management network bug OSD<-> OSD is in 10Gb data
 network. So only ping with Mon is not enough. 

Actually there are heartbeat on both public and cluster network, just use to ensure the network connectivity.

                                                                                                                                                                                                Xiaoxi

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Liu, Ming (HPIT-GADSC)

Sent: Monday, April 13, 2015 12:08 PM

To: ceph-users@xxxxxxxxxxxxxx

Subject:  question about OSD failure detection

Hi, all,

I am new to study Ceph. Trying to understand how it works and designs.
One basic question for me is about how Ceph OSD perform failure detection. I did some searching but cannot get satisfied answer, so try to ask here and hope someone can kindly help me.
The document said, OSD will send heartbeat to other OSD and report failure to Monitor when it detect some other OSD is down.
My questions are
1.  One particular OSD will heartbeat to how many other OSD? Is it possible for one OSD to do heartbeat to most of other OSDs or even all other OSDs in the cluster? In other words, how OSD decide the list of OSDs
 that it needs to check for health?
2.  If OSD detect a failure, to which Monitor it report to? This selection is random or has some rule? Or it needs to report to multiple Monitors?
3.  Why not each OSD directly do heartbeat with a monitor?

If I get question 1 answered that one OSD maybe need to checking all other OSDs in the cluster, and if this is true for many OSDs in the cluster, it looks like a lot of network traffic and redundant.
 Say, OSD-1 check OSD-2, OSD-3, OSD-4, while OSD-2 also check OSD-3, OSD-4, OSD-5. Then both OSD1,2 do redundant health checking for OSD-4,5.
If answer of question 1 is: one OSD only need to do heartbeat with very few other OSDs, never has possibility to check most of other OSDs, then I am fine, this will decentralize the monitor for
 health checking. But I want to know the rule of how OSD decide which other OSD it needs to check to further understand this.

I read almost all articles on internet I can find up to now, but still cannot get very satisfied answer. I don’t want to dive into source code yet, that may take a long time for me. I want to first understand the
 principles. Then decide if I really worth to spend huge time to read src code. So really want someone can help me here.

Any help will be very appreciated!!

Thanks,
Ming

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com