Raid 10 chunksize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm trying to pin down some performance issues with a machine where I work, we are seeing (read only) query response times blow out by an order of magnitude or more at busy times. Initially we blamed autovacuum, but after a tweak of the cost_delay it is *not* the problem. Then I looked at checkpoints... and altho there was some correlation with them and the query response - I'm thinking that the raid chunksize may well be the issue.

Fortunately there is an identical DR box, so I could do a little testing. Details follow:

Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS
Debian Lenny
Pg 8.3.6

The disk is laid out using software (md) raid:

4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
2 drives raid 1 with database transaction logs (ext3 ordered, noatime)

The relevant non default .conf params are:

shared_buffers = 2048MB work_mem = 4MB maintenance_work_mem = 1024MB max_fsm_pages = 153600 bgwriter_lru_maxpages = 200 wal_buffers = 2MB checkpoint_segments = 32 effective_cache_size = 4096MB autovacuum_vacuum_scale_factor = 0.1 autovacuum_vacuum_cost_delay = 60 # This is high, but seemed to help...

I've run pgbench:

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 655.335102 (including connections establishing)
tps = 655.423232 (excluding connections establishing)


Looking at iostat while it is running shows (note sda-sdd raid10, sde and sdf raid 1):

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 56.80 0.00 579.00 0.00 2.47 8.74 133.76 235.10 1.73 100.00 sdb 0.00 45.60 0.00 583.60 0.00 2.45 8.59 52.65 90.03 1.71 100.00 sdc 0.00 49.00 0.00 579.80 0.00 2.45 8.66 72.56 125.09 1.72 100.00 sdd 0.00 58.40 0.00 565.00 0.00 2.42 8.79 135.31 235.52 1.77 100.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 12.80 0.00 23.40 0.00 0.15 12.85 3.04 103.38 4.27 10.00 sdb 0.00 12.80 0.00 22.80 0.00 0.14 12.77 2.31 73.51 3.58 8.16 sdc 0.00 12.80 0.00 21.40 0.00 0.13 12.86 2.38 79.21 3.63 7.76 sdd 0.00 12.80 0.00 21.80 0.00 0.14 12.70 2.66 90.02 3.93 8.56 sde 0.00 2546.80 0.00 146.80 0.00 10.53 146.94 0.97 6.38 5.34 78.40 sdf 0.00 2546.80 0.00 146.60 0.00 10.53 147.05 0.97 6.38 5.53 81.04

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 231.40 0.00 566.80 0.00 3.16 11.41 124.92 228.26 1.76 99.52 sdb 0.00 223.00 0.00 558.00 0.00 3.06 11.23 46.64 83.55 1.70 94.88 sdc 0.00 230.60 0.00 551.60 0.00 3.07 11.40 94.38 171.54 1.76 96.96 sdd 0.00 231.40 0.00 528.60 0.00 2.94 11.37 122.55 220.81 1.83 96.48 sde 0.00 1495.80 0.00 99.00 0.00 6.23 128.86 0.81 8.15 7.76 76.80 sdf 0.00 1495.80 0.00 99.20 0.00 6.26 129.24 0.73 7.40 7.10 70.48

Top looks like:

Cpu(s): 2.5%us, 1.9%sy, 0.0%ni, 71.9%id, 23.4%wa, 0.2%hi, 0.2%si, 0.0%st
Mem:  16474084k total, 15750384k used,   723700k free,  1654320k buffers
Swap:  2104440k total,      944k used,  2103496k free, 13552720k cached

It looks to me like we are maxing out the raid 10 array, and I suspect the chunksize (4K) is the culprit. However as this is a pest to change (!) I'd like some opinions on whether I'm jumping to conclusions. I'd also appreciate comments about what chunksize to use (I've tended to use 256K in the past, but what are folks preferring these days?)

regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux