Re: [PATCH] md: raid10: wake up frozen array

Clive Messer <clive@xxxxxxxxxxxxxxxxx> · Sat, 30 Aug 2008 22:30:52 +0100

On Fri, 2008-07-25 at 12:03 -0700, Arthur Jones wrote:
> When rescheduling a bio in raid10, we wake up
> the md thread, but if the array is frozen, this
> will have no effect.  This causes the array to
> remain frozen for eternity.  We add a wake_up
> to allow the array to de-freeze.  This code is
> nearly identical to the raid1 code, which has
> this fix already.

Can someone explain this to me in simple terms? 
What will cause a rescheduling of bio?
Frozen for eternity - what will be the effect assuming my root file
system is on raid10?

I have a Fedora Core 9 box using a 4 disk f2 raid10 array. This is the
main partition and root file system. Every couple of days the machine
would hard lock. Sometimes I could ssh in. Most of the time not. I never
managed to catch anything to the logs with SysRq. With the benefit of
hindsight - if the kernel was 'jammed' writing to logfiles on a frozen
raid10 array that could explain it. I assumed faulty hardware. I have
actually replaced one at a time, (and at considerable expense), the
power supply, motherboard, processor, all 4 disks in the array. Still
the machine would lock-up. What is interesting is that I have managed 5
days uptime since I added this one line patch to
2.6.25.14-108.fc9.x86_64. Could someone confirm for me that it is more
than likely that the hard locks I experienced on this machine could be
resolved by this one line patch? Has this patch now made it into an
official kernel release?

> Signed-off-by: Arthur Jones <ajones@xxxxxxxxxxxx>
> ---
>  drivers/md/raid10.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 159535d..d41bebb 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -215,6 +215,9 @@ static void reschedule_retry(r10bio_t *r10_bio)
>  	conf->nr_queued ++;
>  	spin_unlock_irqrestore(&conf->device_lock, flags);
>  
> +	/* wake up frozen array... */
> +	wake_up(&conf->wait_barrier);
> +
>  	md_wakeup_thread(mddev->thread);
>  }
>  

Regards

Clive
- 
Clive Messer <clive@xxxxxxxxxxxxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html