Re: [Linux-cluster] Re: patches.

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Tue, 16 Nov 2004 14:40:33 -0600

> only issue.
> 
> > > > A P.S. here, I just looked over the agent list rebuilding code, and
> > > > my race detector is beeping full on.  I'll have a ponder before
> > > > offering specifics though.
> > >
> > > That's a question I have.  The ast() function gets called by
> > > dlm_dispatch(). right? If so, I don't see the race. If not, there is
> > > one hell of dangerous race.  If the agent_list is changing when agent
> > > is trying to contact the other agents, bad things will most likely
> > > happen.
> > 
> > If the list changes and we don't know about the changes while waiting to 
> > get answers back from other agents, we're dead in the water.  So the 
> > recovery algorithm must handle membership changes that happen in 
> > parallel.  After much pondering, I think I've got a reasonably simple 
> > algorithm, I'll write it up now.
> 
> Um... but since we wait for agent responses in the same poll loop that we wait
> for membership change notifications, these two things already do happen in
> parallel... well... mostly. The only issue I see is that we could get the event
> from magma, and then block trying to get the member_list.  But since
> that's a local call, if that's hangs forever, then cman is in trouble,
> and there isn't much we can do anyway. But there is no chance of not getting
> a membership change because we are waiting on a agent response.

Just to clarify. The issue that I had earlier mentioned is this: If the ast()
code and the rebuild_agent_list() code executed at the same time, which I
don't believe they can, they are both using the same data structures, and could
muck each other up.

> > Well, we have for sure gotten to the interesting part of this, how about 
> > we continue in linux-cluster?
> >
> 
> Sure. But I'm not sure if anyone else is interested in implementation details.
> 
> > Regards,
> > 
> > Daniel
> 
> -Ben
> 
> --
> 
> Linux-cluster@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/linux-cluster