Re: [PATCH] FIX: imsm: OROM does not recognize degraded arrays

Dan Williams <dan.j.williams@xxxxxxxxx> · Thu, 24 Feb 2011 09:42:53 -0800

Thanks Krzysztof, comments below:

On Wed, Feb 23, 2011 at 10:53 PM, Krzysztof Wojcik
<krzysztof.wojcik@xxxxxxxxx> wrote:
> Defect description:
> When we create an redundant array in mdadm and then degrade it
> by disk removing, Option ROM and Windows OS does not detect any array.
> Reason:
> Metadata created and updated after degrading array is not compatible
> with IMSM standard.
>
> This patch synchronizes the metadata according IMSM requirements.
> Following inconsistencies have been fixed:
> - reset all fields in imsm_dev during creation to avoid random values
> - init dev status during creation to proper state
> - not reset CONFIGURED_DISK flag when disk is missing
> - add ":0" suffix to the serial number for missing/failed disks
> - update medatada signature after takeover operation
> - mark map state as degraded after raid0->raid10 takeover
>
> Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@xxxxxxxxx>
> ---
>  super-intel.c |   27 +++++++++++++++++----------
>  1 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index c101dca..a40b7e5 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -3437,6 +3437,8 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
>                fprintf(stderr, Name": could not allocate raid device\n");
>                return 0;
>        }
> +       memset(dev, 0, (sizeof(*dev) + sizeof(__u32) * (info->raid_disks - 1)));
> +

Consider changing the allocation to calloc, or store the size in a
temporary variable so we don't duplicate the size calculation.

>        strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_LEN);
>        if (info->level == 1)
>                array_blocks = info_to_blocks_per_member(info);
> @@ -3449,8 +3451,7 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
>
>        dev->size_low = __cpu_to_le32((__u32) array_blocks);
>        dev->size_high = __cpu_to_le32((__u32) (array_blocks >> 32));
> -       dev->status = __cpu_to_le32(0);
> -       dev->reserved_blocks = __cpu_to_le32(0);
> +       dev->status = (DEV_READ_COALESCING | DEV_WRITE_COALESCING);

The orom will fail to assemble arrays without these status bits set?

>        vol = &dev->vol;
>        vol->migr_state = 0;
>        set_migr_type(dev, MIGR_INIT);
> @@ -5046,6 +5047,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
>        __u32 ord;
>        int slot;
>        struct imsm_map *map;
> +       char buf[PATH_MAX];

This is a very large stack allocation which is potentially problematic
for mdmon.

>
>        /* new failures are always set in map[0] */
>        map = get_imsm_map(dev, 0);
> @@ -5057,9 +5059,12 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
>        ord = __le32_to_cpu(map->disk_ord_tbl[slot]);
>        if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
>                return 0;
> +       sprintf(buf, "%s:0", disk->serial);
> +       int len = strlen(buf);
> +       unsigned int shift = len - MAX_RAID_SERIAL_LEN + 1;
> +       strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);

Other option-roms truncated the serial number just to make sure it did
not match (see mark_missing()), are you sure the ":0" matters?  And if
we are already tolerant of chopping characters why not just
append/overwrite the end of the serial number.

>        disk->status |= FAILED_DISK;
> -       disk->status &= ~CONFIGURED_DISK;

The  Windows driver was observed making this modification, but maybe
only in the "missing" case.  Have you verified that this works in the
failed but still present case (versus the device gone case).

>        set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD);
>        if (map->failed_disk_num == 0xff)
>                map->failed_disk_num = slot;
> @@ -6063,16 +6068,16 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
>                        *space_list = *space;
>                        du = (void *)space;
>                        memcpy(du, super->disks, sizeof(*du));
> -                       du->disk.status = FAILED_DISK;
> -                       du->disk.scsi_id = 0;
> +                       du->disk.status = CONFIGURED_DISK | FAILED_DISK;
> +                       du->disk.scsi_id = ~0;

Looks like the code has grown some open coded duplications of what
mark_failure() is doing.  Let's unify this.  Keep subtle compatibility
logic confined to one location.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html