Re: Raid5 race patch (fwd)

Neil Brown <neilb@cse.unsw.edu.au> · Sat, 16 Mar 2002 21:00:46 +1100 (EST)

On Friday March 15, gody@master.slon.net wrote:
> On Fri, 15 Mar 2002, Neil Brown wrote:
> 
> New developement.
> 
> I just discovered one thing, which might be different from Your setup.
> 
> 1. We have autodetection
> 2. in rc.boot scripts we have also:
> 
> if [ -s /etc/raidtab -a -x /sbin/raidstart ]; then
>   /sbin/raidstart -a
>   [ -x /sbin/raid0run ] &&
>   [ -n "$(sed -ne '/^[^#]*raid-level[  ]*0/p' /etc/raidtab)" ] &&
>     /sbin/raid0run -a
> elif [ -s /etc/mdtab -a -x /sbin/mdadd ]; then
>   /sbin/mdadd -ar
> fi 

I think that is the clue I needed, thanks.

Try this patch:

--- md.c	2002/03/16 09:31:01	1.4
+++ md.c	2002/03/16 09:32:49
@@ -2738,7 +2738,7 @@
 				printk(KERN_WARNING "md: array md%d already exists!\n",
 								mdidx(mddev));
 				err = -EEXIST;
-				goto abort;
+				goto abort_unlock;
 			}
 		default:;
 	}
@@ -2776,7 +2776,7 @@
 			if (err) {
 				printk(KERN_WARNING "md: autostart %s failed!\n",
 					partition_name((kdev_t)arg));
-				goto abort;
+				goto abort_unlock;
 			}
 			goto done;
 
@@ -2792,10 +2792,6 @@
 		goto abort;
 	}
 
-	if (err) {
-		printk(KERN_INFO "md: ioctl lock interrupted, reason %d, cmd %d\n",err, cmd);
-		goto abort;
-	}
 	/* if we don't have a superblock yet, only ADD_NEW_DISK or STOP_ARRAY is allowed */
 	if (!mddev->sb && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY && cmd != RUN_ARRAY) {
 		err = -ENODEV;


We weren't unlocking the mddev on some error paths in md_ioctl.



> 
> So after commenting those out, there is no more freeze on cat
> /proc/mdstat and resinhronization went trhu OK.
> 
> So one of above commands still causes some deadlock.
> 
> I wrapped all down/up andl lock with printk's. Perhaps folowing boot
> print will help to find what's wrong (I have more of those when running
> if You are interested, since it looks like we have way to much locks now
> in the code, like on every cat /proc/mdstat):

I don't think it really is "way too much locks".
When you 
   cat /proc/mdstat
you really need to lock everything so that you can get a consistant
image out.  You really don't want a device to be removed while walking
a list of devices to create the content of /proc/mdstat.

Most of the time there would be much less locking activity.
The only lock that would be claimed often would be the all_mddevs_sem
which is claimed every time an I/O request goes through.  This is
probably not ideal, but I couldn't quickly see a better way to do it.
I will ponder this one a bit more when I get time

Thanks again for persevering,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html