On Thu, Feb 08, 2018 at 10:21:45AM +0100, Martin Wilck wrote: > Hi Ben, > > On Wed, 2018-02-07 at 16:49 -0600, Benjamin Marzinski wrote: > > commit 0f850db7fceb6b2bf4968f3831efd250c17c6138 "multipathd: clean up > > set_no_path_retry" has a bug in it. It made set_no_path_retry > > never reset mpp->retry_ticks, even if the device was in recovery > > mode, > > and there were valid paths. This meant that adding new paths didn't > > remove a device from recovery mode, and queueing could get disabled, > > even while there were valid paths. This patch fixes that. > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > --- > > libmultipath/structs_vec.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c > > index fbab61f..0de2221 100644 > > --- a/libmultipath/structs_vec.c > > +++ b/libmultipath/structs_vec.c > > @@ -343,9 +343,10 @@ static void set_no_path_retry(struct multipath > > *mpp) > > dm_queue_if_no_path(mpp->alias, 1); > > break; > > default: > > - if (mpp->nr_active > 0) > > + if (mpp->nr_active > 0) { > > + mpp->retry_tick = 0; > > dm_queue_if_no_path(mpp->alias, 1); > > - else if (is_queueing && mpp->retry_tick == 0) > > + } else if (is_queueing && mpp->retry_tick == 0) > > enter_recovery_mode(mpp); > > break; > > } > > Please explain why it's sufficient to do this in the "default" case > only. Before 0f850db7, set_no_path_retry() reset retry_tick for any > value of no_path_retry. before 0f850db7, set_no_path_retry() was doing this wrong. it was resetting the timeout whenever __setup_multipath() was called with reset, even if there were no usable paths. This could keep devices from disabling queueing like they were supposed to, since retry_count_tick() would ignore them if retry_tick was 0. But, to go throught the current options: It makes no sense to reset retry_tick if not_path_retry is set to NO_PATH_RETRY_UNDEF, NO_PATH_RETRY_FAIL or NO_PATH_RETRY_QUEUE, because we never go into recovery move... Well, actually that's not true. I just noticed a bug in cli_restore_queueing() and cli_restore_all_queueing(), where we can go into recovery mode if we are set to NO_PATH_RETRY_QUEUE. This isn't actually a problem, since that sets retry_ticks to a negative number, which means it will get ignored and we will never actually stop queueing. But that obivously incorrect case aside, we should never be in recovery mode in the first place unless no_path_retry is set to a positive number. The remaining cases where retry_tick was set before 0f850db7 and isn't now are in the default case when there are no valid paths. In that case, if we aren't in recovery mode, we should go into it (that's what the "else if" code does), which means setting the retry_tick to something other than 0. If we have already timed out of recovery mode and queuing is disabled, mpp->retry_tick already is 0. Finally, if we are currently in recovery mode, and retry_tick isn't 0, then we should leave it alone. Otherwise we are simply resetting the no_path_retry timer, when we still don't have any paths, which is one of the bugs the original code was supposed to fix, like I mentioned above. > Martin > > -- > Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel