RE: File system corruption during setting new size (native/extarnal metatdat) after expansion

"Kwolek, Adam" <adam.kwolek@xxxxxxxxx> · Thu, 17 Feb 2011 10:27:05 +0000

> -----Original Message-----
> From: NeilBrown [mailto:neilb@xxxxxxx]
> Sent: Thursday, February 17, 2011 11:04 AM
> To: Kwolek, Adam
> Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: File system corruption during setting new size
> (native/extarnal metatdat) after expansion
> 
> On Thu, 17 Feb 2011 08:45:36 +0000 "Kwolek, Adam"
> <adam.kwolek@xxxxxxxxx>
> wrote:
> 
> > Thank you for workarounds/temporary fixes.
> >
> > Regarding imsm_num_data_members() setting second_map to 0 cannot help
> as it is always called with this parameter set to 0.
> > In this situation, when first map should be always present, we can
> have some race condition.
> > I have no reproduction for mdmon crash you are observing, but I'll try
> some changes in my scripts and I'll carefully watch any signs
> > that can indicate reproduction of this problem.
> >
> > If you could let me know details about changes you made to my
> scenario, it could help.
> >
> 
> This is the script I was using:
> 
> ----------------------------------------------
> export IMSM_NO_PLATFORM=1
> export IMSM_DEVNAME_AS_SERIAL=1
> export MDADM_EXPERIMENTAL=1
> umount /mnt/vol
> mdadm -Ss
> rm -f /backup.bak
> 
> #create container
> mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sda /dev/sdb /dev/sdc -R
> 
> #create volume
> mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 64 --size 104857 -n 3
> /dev/sda /dev/sdb /dev/sdc -R
> mkfs /dev/md/raid5vol_0
> mount /dev/md/raid5vol_0 /mnt/vol
> 
> #copy some files from current directory
> cp * /mnt/vol
> 
> #add spare
> mdadm --add /dev/md/imsm0 /dev/sdd
> 
> mdadm --wait /dev/md/raid5vol_0
> 
> #start reshape
> mdadm --grow /dev/md/imsm0 --raid-devices 4 --backup-file=/backup.bak
> #mdadm  --wait /dev/md/raid5vol_0
> sleep 10
> while grep reshape /proc/mdstat > /dev/null
> do sleep 1
> done
> while ps axgu | grep 'md[a]dm' > /dev/null
> do sleep 1
> done
> umount /mnt/vol
> fsck -f -n /dev/md/raid5vol_0
> -------------------------------------------------
> 
> I did have an 'mdadm --wait' where the 'while grep reshape' is.  I
> changed it
> because it seemed to be causing problems, but I may have been wrong
> about the
> cause.
> 
> This would fairly reliably result in mdmon dying.
> 
> This is the patch I applied
> -------------------------------------
> diff --git a/super-intel.c b/super-intel.c
> index 5d39d5b..fa195c3 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -1600,6 +1600,7 @@ static __u8 imsm_num_data_members(struct imsm_dev
> *dev, int second_map)
>          */
>         struct imsm_map *map = get_imsm_map(dev, second_map);
> 
> +       if (map == NULL) map = get_imsm_map(dev, 0);
>         switch (get_imsm_raid_level(map)) {
>         case 0:
>         case 1:
> 
> -----------------------------------
> 
> This was on an oldish source tree (commit 152b223157), so maybe it is
> already
> fixed.
> But without that patch is crashed often, and with it in didn't crash at
> all.
> 
> NeilBrown

Thank you for information.

On older mdadm I've saw this problem and I think problem is fixed now

Problem was in set_array_state(),when second_map parameter was set to -1, and get_imsm_map() implementation causes problem (NULL can be returned for second_map == '-1').
get_imsm_map() is fixed now (patch: 'imsm: FIX: crash during getting map' /2011-02-03/).

'-1' as second map was changed to 0 later by Anna in 'fix: imsm: assemble doesn't restart recovery' /2011-02-13/ patch.
At this moment second_map is never set to '-1'.

Our lab doesn't report mdmon crashes on latest mdadm code also.

Anyway, I'll have a look during my tests for mdmon core (let say, more carefully than as usual ;)).

BR
Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html