Andrew Morton wrote: >Mikeal, this smells like a devicemapper lockup. Could you please test >2.6.12-rc5 and provide us with a status update? > > I haven't got any unused disks to try this on atm, but i might be able to use dmsetup to create a dmraid inside my 2G swap partition and craft a test which works with the limited space available. I'll send a report to the list as soon as i've got any results, but it will probably take some time in any case. >Thanks. > > /Mikael >Mikael Andersson <mikael@xxxxxxxxx> wrote: > > >>Mikael Andersson wrote: >> >> >>>During heavy io-load a lockup occurs that appears to prevent any disk >>>output from taking place. fs is reiserfs on two device-mapper mirrored >>>200G maxtor disks. After the lockup occurs you can to things like 'ls', >>>but echo > test.txt will hang. >>> >>> >>fs is now ext3 >> >> >> >>>A typical workload producing the error is doing: >>>rsync of large (1GB) over 100Mbit ethernet >>>simultaneous compilation / gunzip >>> >>> >>Or almost anything that writes something to the disk. >> >> >> >>>I've disabled preemption, and tried with and without acpi enabled, with >>>and without smp support (it was smp by default so i switched it off). >>>Also tried with another nic (rtl8139) since i got an nv_stop_tx: >>>TransmitterStatus remained busy<6> in the logs. I also tried 2.6.11.7 >>>with the same result. >>> >>> >> Tried converting to ext3, some problem, albeit the lockups are less >>severe. More of the locked processes can be killed and echo > test.txt >>works. So _some_ io gets through. >> The output from sysrq-T is somewhat less confusing though, it appears >>then hung processes is somehow being hung in __generic_unplug_device, i >>had a look at the assembler, but couldn't make heads or tails of it. the >>code at __generic_unplug_device+19 was test %eax,%eax immediately >>preceded by a callq to the test instruction. Obviously something magic >>(by my eyes) is going on here. >> >> Also tried 2.6.12_rc3-mm3 >> >> I'd really like to find a solution to this since it kinda borks the >>nice an shiny machine if it can't handle large files without getting >>into trouble. >> >> I've been working on this for two days, have been trying to find >>similar bug reports, trying a lot of different kernels and kernel >>options to no avail. >> I'm a little out of options right now, any ideas for something to try, >>patches to test, or some help in understanding what's happening ? >> >> >>kmirrord/0 D ffff81003f1bccd8 0 978 9 1731 977 (L-TLB) >>Call Trace: >><ffffffff8016a2d6>{cache_alloc_refill+1222} >><ffffffff804a2f9f>{io_schedule+15} >>-- >>kjournald D ffff81003e94bcd8 0 1748 1 2060 953 (L-TLB) >>Call Trace: >><ffffffff802e9c13>{__generic_unplug_device+19} >><ffffffff802e9cfd>{generic_unplug_device+189} >>-- >>rsync D 000000701553dccf 0 6903 6901 (NOTLB) >>Call Trace: >><ffffffff802e9c13>{__generic_unplug_device+19} >><ffffffff802e9cfd>{generic_unplug_device+189} >>-- >>x86_64-pc-lin D 0000006dc7d23e49 0 13785 13742 (NOTLB) >>Call Trace: >><ffffffff802e9cfd>{generic_unplug_device+189} >><ffffffff8040e3ad>{dm_unplug_all+29} >> >>/Mikael Andersson >>- >> >>