Re: [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
> Hi Brian,
> 
> On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> > > 
> > The checking for standby is 14 years old, and says that TUR returns
> > a unit attention when the path is in standby. I am not sure why that
> > wouldn’t be handled by this code above: I would think there should be
> > just one unit attention for each I_T_L when an ALUA state change
> > happens.Not sure how it exceeds the retry count.
> > 
> > if (key == 0x6) {
> >     /* Unit Attention, retry */
> >     if (—retry_tur)
> >         goto retry;
> > }
> > 
> > From my perspective failing a path for ALUA state standby is
> > reasonable
> > since it is not an active state.
> 
> I think the historic rationale for using GHOST is that some storage
> arrays, in particular active-passive configurations, may keep certain
> port groups in STANDBY state. If STANDBY was classified as FAILED,
> "multipath -ll" would show all paths of such port groups as FAILED,
> which would confuse users.
> 
> That's what I meant before, multipath's GHOST can mean multiple things
> depending on the actual hardware in use, explicit/implicit ALUA, etc.
> Given that today basically every hardware supports ALUA, we could
> probably do better. But changing the semantics in the current situation
> is risky and error prone.

I am sympathetic to Martin's view that GHOST is an ambiguous state, and
it's not at all clear that in means "temporarily between states". In
fact, it ususally doesn't.  On the other hand, if we can be pretty
certain that devices won't keep paths in the TRANSITIONING state for an
extended time, but we can't be certain what the end state will be, I do
see the rationale for not failing them preemtively. 

I wonder if PATH_PENDING makes more sense.  We would retain the existing
state until the path left the TRANSITIONING state.  The question is, are
you trying to make paths that are transitioning out of a failed state
come back sooner, or are you trying to keep paths that were in a active
state from being prevemtively failed.  Using PATH_PENDING won't fix the
first case, only the second.

PATH_PENDING makes sure that if IO to the path does start failing, the
checker won't keep setting the path back to an active state again.  It
also avoids the another GHOST issue, where the path would end up being
grouped with any actually passive paths, which isn't what we're looking
for.

The one problem I can think of off the top of my head would be that if
the device was held in the TRANSISTIONING state for a long time,
multipathd would keep checking it constantly, since PATH_PENDING is
really meant for cases where the checker hasn't completed yet, and we
just want to keep looking for the result. I suppose it would be possible
to add another state that worked just like pending (and could even get
converted to PATH_PENDING if there was no other state to be converted
to) but didn't cause us to retigger the checker so quickly.  But if
devices really will only be in TRANSITIONING for a short time, it might
not even be an issue we have to worry about.

Thoughts?

-Ben

> 
> Regards
> Martin
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux