Re: interesting case of a hung 'recovery'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2017-03-09 8:39 GMT+01:00 Eyal Lebedinsky <eyal@xxxxxxxxxxxxxx>:
> Bump.
>
> On 18/02/17 23:14, Eyal Lebedinsky wrote:
>>
>> I should start by saying that this is an old fedora 19 system
>>
>> Executive summary: after '--add'ing a new member a 'recovery' starts but
>> 'sync_max' is not reset.
>>
>> $ uname -a
>> Linux e7.eyal.emu.id.au 3.14.27-100.fc19.x86_64 #1 SMP Wed Dec 17 19:36:34
>> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
>
> $ sudo mdadm --version
> mdadm - v4.0 - 2017-01-09
>
>> so the issue may have been fixed since.
>>
>> I had a disk fail in a raid6. After some 'pending' sectors were logged I
>> decided to do a 'check'
>> around that location (set sync_min/max and echo 'check'). Sure enough it
>> elicited disk errors,
>> but the disk did not recover and it was kicked out of the array. Moreover
>> it became unresponsive.
>> It needed a power cycle so I shutdown and rebooted the machine.
>>
>> Not one to give up easily I tried the check again, with the same result.
>> It was time to '--remove' this array member. I then '--add'ed a new disk
>> which started a recovery.
>>
>> A few hours later I noticed that it slowed down. A lot. It actually did
>> not progress at all for
>> a few hours (I was away from the machine).
>>
>> As I was staring at the screen (for a long while) I realised that it
>> stopped at 55.5%, and this
>> number is exactly where the original 'check' failed (I still do not
>> understand why with my bad
>> memory I remembered this number).
>>
>> I checked 'sync_completed' and it was proper.
>> I then examined 'sync_max' and it was wrong - it had the location where
>> the very early 'check'
>> failed earlier in the day. It was the same sector where it is now paused
>> at - looks related.
>>
>> I decided to take a (small) risk and do
>>     # echo 'max' >/sys/block/md127/md/sync_max
>> at which point the recovery moved on. It should be finished in about 5
>> hours.
>>
>> I do not think that it is correct for 'sync_max' to not be set to 'max'
>> when a new member is
>> added - it surely requires a full recovery.
>>
>> I really hope (and expect) that this was actually fixed, but this note may
>> help others facing
>> same predicament.
>>
>> cheers
>>
>
> --
> Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx)

You'd better offer attach much detailed information, then people can help.

eg:
https://raid.wiki.kernel.org/index.php/Asking_for_help

For the problem you reported, better offer also kernel dmesg, output
of blocking tasks via "echo w >  /proc/sysrq-trigger" maybe also
"echo t > /proc/sysrq-trigger"

Cheers,
Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux