Re: Exact scope of OSD heartbeating?

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Wed, 18 Jul 2018 10:21:18 -0700

Thanks, Dan.  I thought so but wanted to verify.  I'll see if I can work up a doc PR to clarify this.

>> The documentation here:
>> 
>> http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/
>> 
>> says
>> 
>> "Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 seconds"
>> 
>> and
>> 
>> " If a neighboring Ceph OSD Daemon doesn’t show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may consider the neighboring Ceph OSD Daemon down and report it back to a Ceph Monitor,"
>> 
>> I've always thought that each OSD heartbeats with *every* other OSD, which of course means that total heartbeat traffic grows ~ quadratically.  However in extending test we've observed that the number of other OSDs that a subject heartbeat (heartbeated?) was < N, which has us wondering if perhaps only OSDs with which a given OSD shares are contacted -- or some other subset.
>> 
> 
> OSDs heartbeat with their peers, the set of osds with whom they share
> at least one PG.
> You can see the heartbeat peers (HB_PEERS) in ceph pg dump -- after
> the header "OSD_STAT USED  AVAIL TOTAL HB_PEERS..."
> 
> This is one of the nice features of the placement group concept --
> heartbeats and peering in general stays constant with the number of
> PGs per OSD, rather than scaling up with the total number of OSDs in a
> cluster.
> 
> Cheers, Dan
> 
> 
>> I plan to submit a doc fix for mon_osd_min_down_reporters and wanted to resolve this FUD first.
>> 
>> -- aad
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com