Re: [PATCH] [PATCH] libmultipath: return 'ghost' state when port is in transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2023-03-06 at 13:04 -0600, Benjamin Marzinski wrote:
> On Mon, Mar 06, 2023 at 12:46:50PM +0100, Martin Wilck wrote:
> > Hi Brian,
> > 
> > On Sat, 2023-03-04 at 12:49 -0800, Brian Bunker wrote:
> > > > 
> > > The checking for standby is 14 years old, and says that TUR
> > > returns
> > > a unit attention when the path is in standby. I am not sure why
> > > that
> > > wouldn’t be handled by this code above: I would think there
> > > should be
> > > just one unit attention for each I_T_L when an ALUA state change
> > > happens.Not sure how it exceeds the retry count.
> > > 
> > > if (key == 0x6) {
> > >     /* Unit Attention, retry */
> > >     if (—retry_tur)
> > >         goto retry;
> > > }
> > > 
> > > From my perspective failing a path for ALUA state standby is
> > > reasonable
> > > since it is not an active state.
> > 
> > I think the historic rationale for using GHOST is that some storage
> > arrays, in particular active-passive configurations, may keep
> > certain
> > port groups in STANDBY state. If STANDBY was classified as FAILED,
> > "multipath -ll" would show all paths of such port groups as FAILED,
> > which would confuse users.
> > 
> > That's what I meant before, multipath's GHOST can mean multiple
> > things
> > depending on the actual hardware in use, explicit/implicit ALUA,
> > etc.
> > Given that today basically every hardware supports ALUA, we could
> > probably do better. But changing the semantics in the current
> > situation
> > is risky and error prone.
> 
> I am sympathetic to Martin's view that GHOST is an ambiguous state,
> and
> it's not at all clear that in means "temporarily between states". In
> fact, it ususally doesn't.  On the other hand, if we can be pretty
> certain that devices won't keep paths in the TRANSITIONING state for
> an
> extended time, but we can't be certain what the end state will be, I
> do
> see the rationale for not failing them preemtively. 

This is an important point, for which I don't see a general solution.
Unfortunately, if a device is TRANSITIONING, the SCSI spec offers no
means for us to determine what state it's transitioning to, not even
whether the transition is "up" or "down" in the state hierarchy. We can
only guess from the previous state, but it will never be more than just
that, a guess.

> I wonder if PATH_PENDING makes more sense.  We would retain the
> existing
> state until the path left the TRANSITIONING state.  The question is,
> are
> you trying to make paths that are transitioning out of a failed state
> come back sooner, or are you trying to keep paths that were in a
> active
> state from being prevemtively failed.  Using PATH_PENDING won't fix
> the
> first case, only the second.

A very interesting suggestion. I like it.

I think that it makes little sense to try and make such paths "come
back sooner". TRANSITIONING devices aren't usable, and any attempt to
try to use them will cause an IO error and switch to FAILED state
immediately by the kernel driver. PATH_PENDING would cause devices that
are "coming up" to be checked frequently, and thus make them available
within one checker interval of them becoming actually ACTIVE, which is
about the best we can do in the "transitioning up" case.

When the path is going "down" from PATH_UP state, PATH_PENDING would
imply that the kernel DM_STATE would remain as-is (probably "active").
If I/O is happening, the device would sooner or later be used by the
kernel, and the I/O would most probably fail, setting the path to
FAILED. With PATH_GHOST, the path would get a lower priority and thus
the likelyhood of it being used would be decreased, at least with
group_by_prio (although this would mean that this path would be grouped
together with STANDBY paths, see below).

Again, I think the behavior with PATH_PENDING would be the best we can
get. Whether or not the kernel fails the device in the meantime,
multipathd will issue TUR frequently, and eventually see the device
arriving in a new state, which will probably be STANDBY or UNAVAILABLE.

> 
> PATH_PENDING makes sure that if IO to the path does start failing,
> the
> checker won't keep setting the path back to an active state again. 
> It
> also avoids the another GHOST issue, where the path would end up
> being
> grouped with any actually passive paths, which isn't what we're
> looking
> for.

Good point! This causes pointless re-grouping of paths for group-by-
prio for every ALUA transition. OTOH, we see such regrouping anyway, in
particular if the paths don't transition simultaneously (or we don't
detect the transition simultaneously). The only way to avoid this would
be a path grouping algorithm that directly uses RTPG reported path
groups rather than grouping by prio. We don't have such an algorithm
currently.

> The one problem I can think of off the top of my head would be that
> if
> the device was held in the TRANSISTIONING state for a long time,
> multipathd would keep checking it constantly, since PATH_PENDING is
> really meant for cases where the checker hasn't completed yet, and we
> just want to keep looking for the result. I suppose it would be
> possible
> to add another state that worked just like pending (and could even
> get
> converted to PATH_PENDING if there was no other state to be converted
> to) but didn't cause us to retigger the checker so quickly.  But if
> devices really will only be in TRANSITIONING for a short time, it
> might
> not even be an issue we have to worry about.

The default transitioning timeout is 60s, and in my experience, even if
the hardware overrides it, it's rarely more than a few minutes. After
that, the kernel will set the state to STANDBY.

Unless we're both overlooking something, I agree with you that
PATH_PENDING is the right thing to do for TRANSITIONING. When a device
is in transition between states, we _want_ to check it often to make
sure we notice when the target state is reached.

We must then be careful not to overload PATH_PENDING with too many
different meanings. But I don't see this as a big issue currently.

Regards
Martin

> Thoughts?
> 
> -Ben
> 
> > 
> > Regards
> > Martin
> 

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux