Re: Peering in crimson

kefu chai <tchaikov@xxxxxxxxx> · Sun, 24 Mar 2019 21:10:00 +0800

On Sat, Mar 23, 2019 at 6:08 AM Mark Nelson <mnelson@xxxxxxxxxx> wrote:
>
> On 3/22/19 4:43 PM, Sam Just wrote:
> > As I mentioned on https://github.com/ceph/ceph/pull/27071 I'm pretty confident
> > that the fastest way to get peering working in crimson is to extract the logic
> > for dealing with the peering state, messages, and state transitions from PG.h
> > into a module we can directly reuse in Crimson.  As it happens, most of the
> > heavy lifting was already done with the introduction of the state machine.
> > Each peering message/event gets injected into the state machine and results in
> > a set of side effects in the RecoveryCtx to be executed asynchronously.  The
> > handlers do not actually perform any IO or messaging operations except by
> > populating RecoveryCtx.  As such, in principle, it's already well suited to a
> > seastar world.
> >
> > I've started working on that refactor.  It's been straightforward (if fiddly
> > and time consuming) so far to move RecoveryMachine along with the appropriate
> > PG state (peer_info etc) and methods (proc_replica_info etc) into a separate
> > file/class such that classicPG and crimsonPG are only referenced through a
> > narrow interface for things like notification of state changes.
> >
> > Given that we want crimson to be compatible (at least for now) with RADOS as we
> > know it, I think this is probably the right path.  Of course, adopting the

agreed. i think we will have the luxury to change the protocol of
peering in future once we stablize the i/o path, then we will be ready
to step back and revisit the peering machinery we need to improve if
it hurts the performance or for some other reasons. but at this
moment, we do need to have a working peering in crimon-osd, and it
should work with the classic osd.

> > existing protocol does mean adopting some of the same overhead in that we still
> > in some sense need to persist certain log/info structures required by peering
> > -- though we can of course do it in a radically different way or
> > ignore some of the
> > IO overhead for testing purposes.  Nevertheless, I think it represents
> > at least a
> > decent next step.

+1

> > -Sam
>
>
> That is awesome Sam.  Glad to have you back. :)
>
>
> Mark
>

-- 
Regards
Kefu Chai