Christian Pernegger wrote:
> OK. Back to the fs again, same command, different device. Still
> glacially slow (and still running), only now the whole box is at a
> standstill, too. cat /proc/cpuinfo takes about 3 minutes (!) to
> complete, I'm still waiting for top to launch (15min and counting).
> I'll leave mke2fs running for now ...
What's the state of your array at this point - is it resyncing?
Yes. Didn't think it would matter (much). Never did before.
It does. If everything works ok, it should not, but it's not your
case ;)
o how about making filesystem(s) on individual disks first, to see
how that will work out? Maybe on each of them in parallel? :)
Running. System is perfectly responsive during 4x mke2fs -j -q on raw devices.
Done. Upper bound for duration is 8 minutes (probaby much lower,
forgot to let it beep on completion), which is much better than the 2
hours with the syncing RAID.
Aha. Excellent.
26: 1041479 267 IO-APIC-fasteoi sata_promise
27: 0 0 IO-APIC-fasteoi sata_promise
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 4 0 12864 1769688 10000 0 0 0 146822 539 809 0 26 23 51
Ok. 146Mb/sec.
Cpu(s): 1.3%us, 8.1%sy, 0.0%ni, 41.6%id, 46.0%wa, 0.7%hi, 2.3%si, 0.0%st
46.0% waiting
I hope you can interpret that :)
Some ;)
o try --assume-clean when creating the array
mke2fs (same command as in first post) now running on fresh
--assumed-clean array w/o crypto. System is only marginally less
responsive than under idle load, if at all.
So the responsibility problem is solved here, right? I mean, if
there's no resync going on (the case with --assume-clean), the rest
of the system works as expected, right?
But inode table writing speed is only about 8-10/second. For the
single disk case I couldn't read the numbers fast enough.
Note that mkfs now has to do 3x more work, too - since the device
is 3x (for 4-drive raid5) larger.
chris@jesus:~$ cat /proc/interrupts
26: 1211165 267 IO-APIC-fasteoi sata_promise
27: 0 0 IO-APIC-fasteoi sata_promise
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 11092 1813376 10804 0 0 0 13316 535 5201 0 9 51 40
That's 10 times slower than in the case of 4 individual disks.
Cpu(s): 0.0%us, 10.1%sy, 0.0%ni, 55.6%id, 33.7%wa, 0.2%hi, 0.3%si, 0.0%st
and only 33.7% waiting, which is probably due to the lack of
parallelism.
From vmstat I gather that total write throughput is an order of
magnitude slower than on the 4 raw disks in parallel. Naturally the
mke2fs on the raid isn't parallelized but it should still be
sequential enough to get the max for a single disk (~60-40MB/s),
right?
Well, not really. Mkfs is doing many small writes all over the
place, so each is seek+write. And it's syncronous - no next write
gets submitted till the current one completes.
Ok. For now I don't see a problem (over than that there IS a problem
somewhere - obviously). Interrupts are ok. System time (10.1%) in
second case doesn't look right, but it was 8.1% before...
Only 2 guesses left. And I really mean "guesses", because I can't
say definitely what's going on anyway.
First, try to disable bitmaps on the raid array, and see if it makes
any difference. For some reason I think it will... ;)
And second, the whole thing looks pretty much like a more general
problem discussed here and elsewhere last few days. I mean handling
of parallel reads and writes - when single write may stall reads
for quite some time and vise versa. I see it every day on disks
without NCQ/TCQ - system is mostly single-tasking, sorta like
ol'good MS-DOG :) Good TCQ-enabled drives survives very high load
while the system is still more-or-less responsible (and I forgot when
I last saw "bad" TCQ-enabled drive - even 10 y/o 4Gb seagate has
excellent TCQ support ;). And all modern SATA stuff works pretty
much like old IDE drives, which were designed "for personal use",
or "single-task only" -- even ones that CLAMS to support NCQ in
reality does not.... But that's a long story, and your disks
and/or controllers (or the combination) don't even support NCQ...
/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html