interesting case of a hung 'recovery'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I should start by saying that this is an old fedora 19 system

Executive summary: after '--add'ing a new member a 'recovery' starts but 'sync_max' is not reset.

$ uname -a
Linux e7.eyal.emu.id.au 3.14.27-100.fc19.x86_64 #1 SMP Wed Dec 17 19:36:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

so the issue may have been fixed since.

I had a disk fail in a raid6. After some 'pending' sectors were logged I decided to do a 'check'
around that location (set sync_min/max and echo 'check'). Sure enough it elicited disk errors,
but the disk did not recover and it was kicked out of the array. Moreover it became unresponsive.
It needed a power cycle so I shutdown and rebooted the machine.

Not one to give up easily I tried the check again, with the same result.
It was time to '--remove' this array member. I then '--add'ed a new disk which started a recovery.

A few hours later I noticed that it slowed down. A lot. It actually did not progress at all for
a few hours (I was away from the machine).

As I was staring at the screen (for a long while) I realised that it stopped at 55.5%, and this
number is exactly where the original 'check' failed (I still do not understand why with my bad
memory I remembered this number).

I checked 'sync_completed' and it was proper.
I then examined 'sync_max' and it was wrong - it had the location where the very early 'check'
failed earlier in the day. It was the same sector where it is now paused at - looks related.

I decided to take a (small) risk and do
	# echo 'max' >/sys/block/md127/md/sync_max
at which point the recovery moved on. It should be finished in about 5 hours.

I do not think that it is correct for 'sync_max' to not be set to 'max' when a new member is
added - it surely requires a full recovery.

I really hope (and expect) that this was actually fixed, but this note may help others facing
same predicament.

cheers

--
Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux