Re: 2.6.23.1: mdadm/raid5 hung/d-state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/05/2007 03:36 AM, BERTRAND Joël wrote:
> Neil Brown wrote:
>> On Sunday November 4, jpiszcz@xxxxxxxxxxxxxxx wrote:
>>> # ps auxww | grep D
>>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>> root       273  0.0  0.0      0     0 ?        D    Oct21  14:40
>>> [pdflush]
>>> root       274  0.0  0.0      0     0 ?        D    Oct21  13:00
>>> [pdflush]
>>>
>>> After several days/weeks, this is the second time this has happened,
>>> while doing regular file I/O (decompressing a file), everything on
>>> the device went into D-state.
>>
>> At a guess (I haven't looked closely) I'd say it is the bug that was
>> meant to be fixed by
>>
>> commit 4ae3f847e49e3787eca91bced31f8fd328d50496
>>
>> except that patch applied badly and needed to be fixed with
>> the following patch (not in git yet).
>> These have been sent to stable@ and should be in the queue for 2.6.23.2
> 
>     My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
> time :
> 
> ...
>         spin_lock(&sh->lock);
>         clear_bit(STRIPE_HANDLE, &sh->state);
>         clear_bit(STRIPE_DELAYED, &sh->state);
> 
>         s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
>         s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
>         s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
>         /* Now to look around and see what can be done */
> 
>         /* clean-up completed biofill operations */
>         if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
>         }
> 
>         rcu_read_lock();
>         for (i=disks; i--; ) {
>                 mdk_rdev_t *rdev;
>                 struct r5dev *dev = &sh->dev[i];
> ...
> 
> but it doesn't fix this bug.
> 

Did that chunk starting with "clean-up completed biofill operations" end
up where it belongs? The patch with the big context moves it to a different
place from where the original one puts it when applied to 2.6.23...

Lately I've seen several problems where the context isn't enough to make
a patch apply properly when some offsets have changed. In some cases a
patch won't apply at all because two nearly-identical areas are being
changed and the first chunk gets applied where the second one should,
leaving nowhere for the second chunk to apply.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux