Re: [PATCH] multipathd: the sysfs prioritizer can return stale data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 23, 2024 at 06:00:07PM -0800, Brian Bunker wrote:
> When a path is lost and then reinstated later, the ALUA
> device handler will not pick up this change and continue
> to possibly provide incorrect (stale) information about
> its ALUA state to the 'sysfs' prioritizer if the path's
> priorities were changed prior to the loss of those paths.
> 
> On the loss of an I_T nexus, the path state should not
> continue to use its last known state. Many things and a lot
> of time could have passed between the path loss and its
> eventual reinstatemnt.
> 
> The ALUA device handler didn't have this issue since it
> always got the ALUA state from the target by sending an RTPG
> request on the path and updated the priority. However with
> the detect priority set to true by default, the 'sysfs'
> prioritzier will be used on targets that support ALUA rather
> than the 'alua' prioritizer.
> 
> Without re-evaluating the ALUA state when a path returns
> multipath is left with path states which do not reflect
> the actual ALUA states the target is providing.
> 
> 3624a9370f0f545fc7c3e46a100011010 dm-2 PURE,FlashArray
> size=3.0T features='0' hwhandler='1 alua' wp=rw
> |-+- policy='service-time 0' prio=50 status=active
> | |- 13:0:0:1 sdg 8:96  active ready running
> | |- 12:0:0:1 sdf 8:80  active ready running
> | |- 9:0:0:1  sdc 8:32  active ready running
> | `- 1:0:0:1  sdb 8:16  active ready running
> `-+- policy='service-time 0' prio=10 status=enabled
>   |- 14:0:0:1 sdh 8:112 active ready running
>   |- 15:0:0:1 sdi 8:128 active ready running
>   |- 10:0:0:1 sdd 8:48  active ready running
>   `- 11:0:0:1 sde 8:64  active ready running
> 
>  # sg_rtpg /dev/sdh (Active/Optimized)
> target port group asymmetric access state : 0x00
> 
> Signed-off-by: Brian Bunker <brian@xxxxxxxxxxxxxxx>
> Signed-off-by: Seamus Connor <sconnor@xxxxxxxxxxxxxxx>
> ---
>  multipathd/main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 230c9d10..dd48be74 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1937,6 +1937,11 @@ reinstate_path (struct path * pp)
>  	else {
>  		condlog(2, "%s: reinstated", pp->dev_t);
>  		update_queue_mode_add_path(pp->mpp);
> +		if (strcmp(pp->prio.name, PRIO_SYSFS) == 0) {
> +			condlog(2, "%s: rescan target to update priorities",
> +				pp->dev_t);
> +			rescan_path(pp->udev);

I can see why this is necessary with the sysfs target, but AFAICT
rescanning the scsi device will end up calling scsi_execute_cmd to
update the vpd pages, and this can block.  Obviously, we are doing this
after the checker has verified that the path is up, but even still,
we're trying to cut down on the amount of code that can block multipathd
on an inaccessible device, instead of increase it.

Perhaps it would be better to make the prioritizer itself smarter, so
that it could know when it needs to run the rescan(), and it could do
that in a separate thread.

Thoughts?

-Ben

> +		}
>  	}
>  }
>  
> -- 
> 2.43.0





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux