Re: [PATCH 1/4] imsm: monitor: do not finish migration if there are no failed disks

"Williams, Dan J" <dan.j.williams@xxxxxxxxx> · Thu, 5 Apr 2012 10:56:45 -0700

On Thu, Apr 5, 2012 at 8:29 AM, Przemyslaw Czarnowski
<przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx> wrote:
> Transition from "degraded" to "recovery" made in OROM is slightly different
> than the same transision in mdadm. Missing disk is not removed from list of
> raid devices, but just from map. Therefore mdadm should not end migration
> basing on existence of list of missing disks but should rely on count of
> failed disks.
>
> Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx>
> ---
>  super-intel.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index dad4c4d..e1cd9b8 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -6953,6 +6953,12 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
>        if (!super->missing)
>                return;
>
> +       /* When orom adds replacement for missing disk it does
> +        * not remove entry of missing disk, but just updates map with
> +        * new added disk. So it is not enough just to test if there is
> +        * any missing disk, we have to look if there are any failed disks
> +        * in map to stop migration */
> +
>        dprintf("imsm: mark missing\n");
>        /* end process for initialization and rebuild only
>         */
> @@ -6963,7 +6969,8 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
>                failed = imsm_count_failed(super, dev, MAP_0);
>                map_state = imsm_check_degraded(super, dev, failed, MAP_0);
>
> -               end_migration(dev, super, map_state);
> +               if (failed)
> +                       end_migration(dev, super, map_state);

Doesn't this need to be something like "failed >= max_degraded" since
some recovery scenarios can continue in the presence of failed disks
like raid10, raid6.  Can you describe a bit more what the user visible
behavior is when the OROM leaves these stale entries?

This may be the right thing to do, but handle_missing is also called
in the case where we know a disk we had previously has been removed
and in that case we certainly want to restart recovery, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html