On Wed, 11 Jul 2012 14:20:53 +1000 NeilBrown <neilb@xxxxxxx> wrote: > On Fri, 06 Jul 2012 11:59:13 +0200 Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> > wrote: > > > NeilBrown <neilb@xxxxxxx> writes: > > > On Tue, 03 Jul 2012 18:07:02 +0200 Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> > > > wrote: > > > > > >> NeilBrown <neilb@xxxxxxx> writes: > > >> > On Mon, 02 Jul 2012 15:24:43 +0200 Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> > > >> > wrote: > > >> > > > >> >> Hi Neil, > > >> >> > > >> >> I am trying to get the test suite stable on RHEL, but I see a lot of > > >> >> failures in 03r5assemV1, in particular between these two cases: > > >> >> > > >> >> mdadm -A $md1 -u $uuid $devlist > > >> >> check state U_U > > >> >> eval $tst > > >> >> > > >> >> mdadm -A $md1 --name=one $devlist > > >> >> check state U_U > > >> >> check spares 1 > > >> >> eval $tst > > >> >> > > >> >> I have tested it with the latest upstream kernel as well and see the > > >> >> same problems. I suspect it is simply the box that is too fast, ending > > >> >> up with the raid check completing inbetween the two test cases? > > >> >> > > >> >> Are you seeing the same thing there? I tried playing with the max speed > > >> >> variable but it doesn't really seem to make any difference. > > >> >> > > >> >> Any ideas for what we can be done to make this case more resilient to > > >> >> false positives? I guess one option would be to re-create the array > > >> >> inbetween each test? > > >> > > > >> > Maybe it really is a bug? > > >> > The test harness set the resync speed to be very slow. A fast box will get > > >> > through the test more quickly and be more likely to see the array still > > >> > syncing. > > >> > > > >> > I'll try to make time to look more closely. > > >> > But I wouldn't discount the possibility that the second "mdadm -A" is > > >> > short-circuiting the recovery somehow. > > >> > > >> That could certainly explain what I am seeing. I noticed it doesn't > > >> happen every single time in the same place (from memory), but it is > > >> mostly in that spot in my case. > > >> > > >> Even if I trimmed the max speed down to 50 it still happens. > > > > > > I cannot easily reproduce this. > > > Exactly which kernel and which mdadm do you find it with - just to make sure > > > I'm testing the same thing as you? > > > > Hi Neil, > > > > Odd - I see it with > > mdadm: 721b662b5b33830090c220bbb04bf1904d4b7eed > > kernel: ca24a145573124732152daff105ba68cc9a2b545 > > > > I've seen this happen for a while fwiw. > > > > Note the box has a number of external drives with a number of my scratch > > raid arrays on it. It shouldn't affect this, but just in case. > > > > The system installed mdadm is a 3.2.3 derivative, but I checked running > > with PATH=. as well. > > Thanks. > I think I figured out what is happening. > > It seems that setting the max_speed down to 1000 is often enough, but not > always. So we need to set it lower. > But setting max_speed lowers is not effective unless you also set min_speed > lower. This is the tricky bit that took me way too long to realised. > > So with this patch, it is quite reliable. > > NeilBrown > > diff --git a/tests/03r5assemV1 b/tests/03r5assemV1 > index 52b1107..bca0c58 100644 > --- a/tests/03r5assemV1 > +++ b/tests/03r5assemV1 > @@ -60,7 +60,8 @@ eval $tst > ### Now with a missing device > # We don't want the recovery to complete while we are > # messing about here. > -echo 1000 > /proc/sys/dev/raid/speed_limit_max > +echo 100 > /proc/sys/dev/raid/speed_limit_max > +echo 100 > /proc/sys/dev/raid/speed_limit_min Purely from an armchair perspective, don't you need to reduce 'min' first, and only then lower 'max'? As it is currently, depending on the kernel side the first "echo" has every right to fail with "Invalid argument" (or something similar), if there'd be a check that max can not be lower than min. > > mdadm -AR $md1 $dev0 $dev2 $dev3 $dev4 # > check state U_U > @@ -124,3 +125,4 @@ mdadm -I -c $conf $dev1 > mdadm -I -c $conf $dev2 > eval $tst > echo 2000 > /proc/sys/dev/raid/speed_limit_max > +echo 1000 > /proc/sys/dev/raid/speed_limit_min -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free."
Attachment:
signature.asc
Description: PGP signature