Raid performance problems (pdflush / raid5 eats 100%)

Goswin von Brederlow <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 02 Oct 2007 17:33:13 +0200

Hi,

we (Q-Leap networks) are in the process of setting up a high speed
storage cluster and we are having some problems getting proper
performance.

Our test system consists of a 2x dual core system with 2 dual channel
UW scsi controlers connected to 2 external raid boxes and we use
iozone with 16GB data on an lustre (ldiskfs) filesystem as speed test
below. The raid boxes internally run raid6 and are split in 2
partitions, one maped to each scsi port. Read-Ahead is set to 32768.

sdb system controler 1: box 1 controler 1
sdc system controler 1: box 2 controler 1
sdd system controler 2: box 1 controler 2
sde system controler 2: box 2 controler 2

Plain disks: sdb1 sdc1 sdd1 sde1
--------------------------------
           write  rewrite read   reread
1 Thread : 225204 269084  288718 288219
2 Threads: 401154 414525  441005 440564
3 Threads: 515818 528943  598863 599455
4 Threads: 587184 638971  737094 730850

raid1 [sdb1 sde1] [sdc1 sdd1] chunk=8192
----------------------------------------
           write  rewrite read   reread
1 Thread : 179262 271810  293111 293593
2 Threads: 326260 345276  496189 498250
4 Threads: 333085 308820  686983 679123
8 Threads: 348458 277097  643260 673025

raid10 f2 [sdb1 sdc1 sdd1 sde1] chunk=8192
------------------------------------------
           write  rewrite read   reread
1 Thread : 215560 323921  466460 436195
2 Threads: 288001 304094  611157 586583
4 Threads: 336072 298115  639925 662107
8 Threads: 243053 183969  665743 638512

As you can see adding an raid1 or raid10 layer already costs a certain
amount of performance. But all within reason. Now the real problem comes:

raid5 [sdb1 sdc1 sdd1 sde1] chunk=64, stripe_cache_size=32768
-----------------------------------------------------------------------
           write  rewrite read   reread
1 Thread : 178540 176061  384928 384653
2 Threads: 218113 214308  379950 376312
4 Threads: 225560 160209  359628 359170
8 Threads: 232252 165669  261981 274043

The performance is totaly limited by pdflush (>80% cpu during write)
with md0_raid5 eating up a substantial percentage too.

raid5 [sdb1 sdc1 sdd1 sde1] chunk=8192, stripe_cache_size=32768
-----------------------------------------------------------
           write  rewrite read   reread
1 Thread : 171138 185105  424504 428974
2 Threads: 165225 141431  553976 545088
4 Threads: 178189 110153  582999 581266
8 Threads: 177892  99679  568720 594580

This is even stranger. Now pdflush uses less cpu (10-70%) but
md0_raid5 is blocking with >95% cpu during write.

Three questions:

1) pdflush is limited to one thread per filesystem. For our useage
that is a bottleneck. Can anything be done there?

2) Why is read performance so lousy with small chunk size?

3) Why does raid5 take so much more cpu time on write with larger
chunk size? The amount of data to checksumm is the same (same speed)
but the cpu time used goes way up. There are no read-modify-write
cycles in there according to vmstat, plain continious writes.

MfG
        Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html