Raid 10 chunksize

Mark Kirkwood <markir@xxxxxxxxxxxxxxx> · Wed, 25 Mar 2009 14:09:07 +1300

I'm trying to pin down some performance issues with a machine where I 
work, we are seeing (read only) query response times blow out by an 
order of magnitude or more at busy times. Initially we blamed 
autovacuum, but after a  tweak of the cost_delay it is *not* the 
problem. Then I looked at checkpoints... and altho there was some 
correlation with them and the query response - I'm thinking that the 
raid chunksize may well be the issue.

Fortunately there is an identical DR box, so I could do a little 
testing. Details follow:

Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS
Debian Lenny
Pg 8.3.6

The disk is laid out using software (md) raid:

4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
2 drives raid 1 with database transaction logs (ext3 ordered, noatime)

The relevant non default .conf params are:

shared_buffers = 2048MB           
work_mem = 4MB               
maintenance_work_mem = 1024MB       
max_fsm_pages = 153600           
bgwriter_lru_maxpages = 200       
wal_buffers = 2MB           
checkpoint_segments = 32      
effective_cache_size = 4096MB
autovacuum_vacuum_scale_factor = 0.1   
autovacuum_vacuum_cost_delay = 60    # This is high, but seemed to help...

I've run pgbench:

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 655.335102 (including connections establishing)
tps = 655.423232 (excluding connections establishing)

Looking at iostat while it is running shows (note sda-sdd raid10, sde 
and sdf raid 1):

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    56.80    0.00  579.00     0.00     2.47     
8.74   133.76  235.10   1.73 100.00
sdb               0.00    45.60    0.00  583.60     0.00     2.45     
8.59    52.65   90.03   1.71 100.00
sdc               0.00    49.00    0.00  579.80     0.00     2.45     
8.66    72.56  125.09   1.72 100.00
sdd               0.00    58.40    0.00  565.00     0.00     2.42     
8.79   135.31  235.52   1.77 100.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    12.80    0.00   23.40     0.00     0.15    
12.85     3.04  103.38   4.27  10.00
sdb               0.00    12.80    0.00   22.80     0.00     0.14    
12.77     2.31   73.51   3.58   8.16
sdc               0.00    12.80    0.00   21.40     0.00     0.13    
12.86     2.38   79.21   3.63   7.76
sdd               0.00    12.80    0.00   21.80     0.00     0.14    
12.70     2.66   90.02   3.93   8.56
sde               0.00  2546.80    0.00  146.80     0.00    10.53   
146.94     0.97    6.38   5.34  78.40
sdf               0.00  2546.80    0.00  146.60     0.00    10.53   
147.05     0.97    6.38   5.53  81.04

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   231.40    0.00  566.80     0.00     3.16    
11.41   124.92  228.26   1.76  99.52
sdb               0.00   223.00    0.00  558.00     0.00     3.06    
11.23    46.64   83.55   1.70  94.88
sdc               0.00   230.60    0.00  551.60     0.00     3.07    
11.40    94.38  171.54   1.76  96.96
sdd               0.00   231.40    0.00  528.60     0.00     2.94    
11.37   122.55  220.81   1.83  96.48
sde               0.00  1495.80    0.00   99.00     0.00     6.23   
128.86     0.81    8.15   7.76  76.80
sdf               0.00  1495.80    0.00   99.20     0.00     6.26   
129.24     0.73    7.40   7.10  70.48

Top looks like:

Cpu(s):  2.5%us,  1.9%sy,  0.0%ni, 71.9%id, 23.4%wa,  0.2%hi,  0.2%si,  
0.0%st
Mem:  16474084k total, 15750384k used,   723700k free,  1654320k buffers
Swap:  2104440k total,      944k used,  2103496k free, 13552720k cached

It looks to me like we are maxing out the raid 10 array, and I suspect 
the chunksize (4K) is the culprit. However as this is a pest to change 
(!) I'd like some opinions on whether I'm jumping to conclusions. I'd 
also appreciate comments about what chunksize to use (I've tended to use 
256K in the past, but what are folks preferring these days?)

regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance