On Tue, 25 Feb 2014 00:01:42 +0200 Denis Golovan <denis.golovan@xxxxxxxxx> wrote: > Hi all > > I am struggling to diagnose a strange freeze of software RAID5 array. > My RAID5 consists of 4 Toshiba SATA drives and has ext4 filesystem on top of it. > > It works fine unless I start several process writing intensively to it. > At first, it looks like the system is under high pressure, then the > system starts lagging a lot and a hard freeze always follows after > several minutes. > > No errors in system log, nothing is emitted to console. Just hard > freeze with HDD light always on. I tried enabling kernel network > logging to another machine and again no information when hanging. > After reboot, my array starts reconstruction and finishes without > errors. > > I tried disabling quotas and barriers for ext4. > After disabling barriers, it almost seemed to work, but after some > time the same hard freeze happens. > > I tested the same hardware configuration under Linux v3.10, 3.11, 3.12 > and now 3.13.5 (all x86 arch) behaves the same way. The same issue can > be reproduced easily. > > So now I tested everything Google suggests on the matter. > Could you give a hint on how to debug this issue? > The most useful thing for debugging a hard freeze is the alt-sysrq-T output when it is frozen. typing that magic sequence should always produce some output unless it is hard-frozen with interrupts disabled. So make sure you can produce the output when the system is working properly (to a log file file the network console would be ideal), then when it hangs, produce the output again. To probably need to have a text console rather than a graphic console for it to work. If it is hard-hanging with interrupts disabled, then it gets tricky. I thought there was some NMI-based lockup detector which would warn if that happened, but I cannot find it just now. NeilBrown
Attachment:
signature.asc
Description: PGP signature