Re: Raid 10 chunksize

Stef Telford <stef@xxxxxxxxx> · Wed, 25 Mar 2009 20:50:20 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark Kirkwood wrote:
> I'm trying to pin down some performance issues with a machine where
>  I work, we are seeing (read only) query response times blow out by
>  an order of magnitude or more at busy times. Initially we blamed
> autovacuum, but after a  tweak of the cost_delay it is *not* the
> problem. Then I looked at checkpoints... and altho there was some
> correlation with them and the query response - I'm thinking that
> the raid chunksize may well be the issue.
>
> Fortunately there is an identical DR box, so I could do a little
> testing. Details follow:
>
> Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS Debian
> Lenny Pg 8.3.6
>
> The disk is laid out using software (md) raid:
>
> 4 drives raid 10 *4K* chunksize with database files (ext3 ordered,
> noatime) 2 drives raid 1 with database transaction logs (ext3
> ordered, noatime)
>
> The relevant non default .conf params are:
>
> shared_buffers = 2048MB           work_mem = 4MB
> maintenance_work_mem = 1024MB       max_fsm_pages = 153600
>  bgwriter_lru_maxpages = 200       wal_buffers = 2MB
> checkpoint_segments = 32      effective_cache_size = 4096MB
> autovacuum_vacuum_scale_factor = 0.1   autovacuum_vacuum_cost_delay
>  = 60    # This is high, but seemed to help...
>
> I've run pgbench:
>
> transaction type: TPC-B (sort of) scaling factor: 100 number of
> clients: 24 number of transactions per client: 12000 number of
> transactions actually processed: 288000/288000 tps = 655.335102
> (including connections establishing) tps = 655.423232 (excluding
> connections establishing)
>
>
> Looking at iostat while it is running shows (note sda-sdd raid10,
> sde and sdf raid 1):
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0.00
> 56.80    0.00  579.00     0.00 2.47     8.74   133.76  235.10
> 1.73 100.00 sdb               0.00    45.60    0.00  583.60
> 0.00 2.45     8.59    52.65   90.03   1.71 100.00 sdc
> 0.00    49.00    0.00  579.80     0.00 2.45     8.66    72.56
> 125.09   1.72 100.00 sdd               0.00    58.40    0.00
> 565.00     0.00 2.42     8.79   135.31  235.52   1.77 100.00 sde
> 0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00
> 0.00   0.00   0.00 sdf               0.00     0.00    0.00    0.00
> 0.00 0.00     0.00     0.00    0.00   0.00   0.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0.00
> 12.80    0.00   23.40     0.00     0.15 12.85     3.04  103.38
> 4.27  10.00 sdb               0.00    12.80    0.00   22.80
> 0.00     0.14 12.77     2.31   73.51   3.58   8.16 sdc
> 0.00    12.80    0.00   21.40     0.00     0.13 12.86     2.38
> 79.21   3.63   7.76 sdd               0.00    12.80    0.00   21.80
> 0.00     0.14 12.70     2.66   90.02   3.93   8.56 sde
> 0.00  2546.80    0.00  146.80     0.00    10.53 146.94     0.97
> 6.38   5.34  78.40 sdf               0.00  2546.80    0.00  146.60
> 0.00    10.53 147.05     0.97    6.38   5.53  81.04
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0.00
> 231.40    0.00  566.80     0.00     3.16 11.41   124.92  228.26
> 1.76  99.52 sdb               0.00   223.00    0.00  558.00
> 0.00     3.06 11.23    46.64   83.55   1.70  94.88 sdc
> 0.00   230.60    0.00  551.60     0.00     3.07 11.40    94.38
> 171.54   1.76  96.96 sdd               0.00   231.40    0.00
> 528.60     0.00     2.94 11.37   122.55  220.81   1.83  96.48 sde
> 0.00  1495.80    0.00   99.00     0.00     6.23 128.86     0.81
> 8.15   7.76  76.80 sdf               0.00  1495.80    0.00   99.20
> 0.00     6.26 129.24     0.73    7.40   7.10  70.48
>
> Top looks like:
>
> Cpu(s):  2.5%us,  1.9%sy,  0.0%ni, 71.9%id, 23.4%wa,  0.2%hi,
> 0.2%si,  0.0%st Mem:  16474084k total, 15750384k used,   723700k
> free,  1654320k buffers Swap:  2104440k total,      944k used,
> 2103496k free, 13552720k cached
>
> It looks to me like we are maxing out the raid 10 array, and I
> suspect the chunksize (4K) is the culprit. However as this is a
> pest to change (!) I'd like some opinions on whether I'm jumping to
>  conclusions. I'd also appreciate comments about what chunksize to
> use (I've tended to use 256K in the past, but what are folks
> preferring these days?)
>
> regards
>
> Mark
>
>
>
Hello Mark,
    Okay, so, take all of this with a pinch of salt, but, I have the
same config (pretty much) as you, with checkpoint_Segments raised to
192. The 'test' database server is Q8300, 8GB ram, 2 x 7200rpm SATA
into motherboard which I then lvm stripped together; lvcreate -n
data_lv -i 2 -I 64 mylv -L 60G (expandable under lvm2). That gives me
a stripe size of 64. Running pgbench with the same scaling factors;

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 24
number of transactions per client: 12000
number of transactions actually processed: 288000/288000
tps = 1398.907206 (including connections establishing)
tps = 1399.233785 (excluding connections establishing)

    It's also running ext4dev, but, this is the 'playground' server,
not the real iron (And I dread to do that on the real iron). In short,
I think that chunksize/stripesize is killing you. Personally, I would
go for 64 or 128 .. that's jst my 2c .. feel free to
ignore/scorn/laugh as applicable ;)

    Regards
    Stef
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknK0UsACgkQANG7uQ+9D9VK3wCeO/guLVb4K4V7VAQ29hJsmstb
2JMAmQEmJjNTQlxng/49D2/xHNw2W19/
=/rKD
-----END PGP SIGNATURE-----

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance