Re: OOB message roll into Messenger interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 6, 2016 at 10:06 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Tue, 6 Sep 2016, Haomai Wang wrote:
>> On Tue, Sep 6, 2016 at 9:17 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> > Hi Haomai!
>> >
>> > On Sun, 4 Sep 2016, Haomai Wang wrote:
>> >> Background:
>> >> Each osd has two heartbeat messenger instances to maintain front/back
>> >> network available. It brings lots of connections and messages overhead
>> >> in scale out cluster. Actually we can combine these heartbeat
>> >> exchanges to public/cluster messengers to reduce tons of
>> >> connections(resources).
>> >>
>> >> Then heartbeat message should be OOB and shared the same thread/socket
>> >> with normal message channel. So it can exactly represent the heartbeat
>> >> role for real IO message. Otherwise, heartbeat channel's status can't
>> >> indicate the real IO message channel status. Because different socket
>> >> uses different send buffer/recv buffer, if real io message blocked,
>> >> oob message may be healthy.
>> >>
>> >> Besides OSD's heartbeat things, we have logic PING/PONG lived in
>> >> Objecter Ping/WatchNotify Ping etc. For the same goal, they could
>> >> share the heartbeat message.
>> >>
>> >> In a real rbd use case env, if we combines these ping/pong messages,
>> >> thousands of messages could be avoided which means lots of resources.
>> >>
>> >> As we reduce the heartbeat overhead, we can reduce heartbeat interval
>> >> and increase frequency which help a lot to the accurate of cluster
>> >> failure detection!
>> >
>> > I'm very excited to see this move forward!
>> >
>> >> Design:
>> >>
>> >> As discussed in Raleigh, we could defines these interfaces:
>> >>
>> >> int Connection::register_oob_message(identitfy_op, callback, interval);
>> >>
>> >> Users like Objecter linger ping could register a "callback" which
>> >> generate bufferlist used to be carried by heartbeat message.
>> >> "interval" indicate the user's oob message's send interval.
>> >>
>> >> "identitfy_op" indicates who can handle the oob info in peer side.
>> >> Like "Ping", "OSDPing" or "LingerPing" as the current message define.
>> >
>> > This looks convenient for the simpler callers, but I worry it won't work
>> > as well for OSDPing. There's a bunch of odd locking around the heartbeat
>> > info and the code already exists to do the the heartbeat sends.  I'm not
>> > sure it will simplify to a simple interval.
>>
>> Hmm, I'm not sure what's the odd locking thing refer to. As we can
>> register callback when adding new peer and unregister callback when
>> removing peer from "heartbeat_peers".
>>
>> The main send message construct callback extract from this loop:
>>   for (map<int,HeartbeatInfo>::iterator i = heartbeat_peers.begin();
>>        i != heartbeat_peers.end();
>>        ++i) {
>>     int peer = i->first;
>>     i->second.last_tx = now;
>>     if (i->second.first_tx == utime_t())
>>       i->second.first_tx = now;
>>     dout(30) << "heartbeat sending ping to osd." << peer << dendl;
>>     i->second.con_back->send_message(new MOSDPing(monc->get_fsid(),
>>  service.get_osdmap()->get_epoch(),
>>  MOSDPing::PING,
>>  now));
>>
>>     if (i->second.con_front)
>>       i->second.con_front->send_message(new MOSDPing(monc->get_fsid(),
>>     service.get_osdmap()->get_epoch(),
>>     MOSDPing::PING,
>>     now));
>>   }
>>
>> Only "fsid", "osdmap epoch" are required, I don't think it will block.
>> Then I think lots of locking/odding things exists on heartbeat
>> dispatch/handle process. sending process is clear I guess.
>
> Yeah, I guess that's fine.  I was worried about some dependency between
> who we ping and the osdmap epoch in the message (and races adding/removing
> heartbeat peers), but I think it doesn't matter.
>
> Even so, I think it would be good to expose the send_message_oob()
> interface, and do this in 2 stages so the two changes are decoupled.
> Unless there is some implementation reason why the oob message scheduling
> needs to be done inside the messenger?

Agreed! we could remove heartbeat messenger firstly!

>
> sage
>
>> The advantage to register callback is we can combine multi layers oob
>> messages to one.
>>
>> >
>> > An easier first step would be to just define a
>> > Connection::send_message_oob(Message*).  That would require almost no
>> > changes to the calling code, and avoid having to create the timing
>> > infrastructure inside AsyncMessenger...
>> >
>> > sage
>> >
>> >> void Dispatcher::ms_dispatch_oob(Message*)
>> >>
>> >> handle the oob message with parsing each oob part.
>> >>
>> >> So lots of timer control in user's side could be avoided via callback
>> >> generator. When sending, OOB message could insert the front of send
>> >> message queue but we can't get any help from kernel oob flag since
>> >> it's really useless..
>> >>
>> >> Any suggestion is welcomed!
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux