-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Kirkwood wrote: > I'm trying to pin down some performance issues with a machine where > I work, we are seeing (read only) query response times blow out by > an order of magnitude or more at busy times. Initially we blamed > autovacuum, but after a tweak of the cost_delay it is *not* the > problem. Then I looked at checkpoints... and altho there was some > correlation with them and the query response - I'm thinking that > the raid chunksize may well be the issue. > > Fortunately there is an identical DR box, so I could do a little > testing. Details follow: > > Sun 4140 2x quad-core opteron 2356 16G RAM, 6x 15K 140G SAS Debian > Lenny Pg 8.3.6 > > The disk is laid out using software (md) raid: > > 4 drives raid 10 *4K* chunksize with database files (ext3 ordered, > noatime) 2 drives raid 1 with database transaction logs (ext3 > ordered, noatime) > > The relevant non default .conf params are: > > shared_buffers = 2048MB work_mem = 4MB > maintenance_work_mem = 1024MB max_fsm_pages = 153600 > bgwriter_lru_maxpages = 200 wal_buffers = 2MB > checkpoint_segments = 32 effective_cache_size = 4096MB > autovacuum_vacuum_scale_factor = 0.1 autovacuum_vacuum_cost_delay > = 60 # This is high, but seemed to help... > > I've run pgbench: > > transaction type: TPC-B (sort of) scaling factor: 100 number of > clients: 24 number of transactions per client: 12000 number of > transactions actually processed: 288000/288000 tps = 655.335102 > (including connections establishing) tps = 655.423232 (excluding > connections establishing) > > > Looking at iostat while it is running shows (note sda-sdd raid10, > sde and sdf raid 1): > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await svctm %util sda 0.00 > 56.80 0.00 579.00 0.00 2.47 8.74 133.76 235.10 > 1.73 100.00 sdb 0.00 45.60 0.00 583.60 > 0.00 2.45 8.59 52.65 90.03 1.71 100.00 sdc > 0.00 49.00 0.00 579.80 0.00 2.45 8.66 72.56 > 125.09 1.72 100.00 sdd 0.00 58.40 0.00 > 565.00 0.00 2.42 8.79 135.31 235.52 1.77 100.00 sde > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await svctm %util sda 0.00 > 12.80 0.00 23.40 0.00 0.15 12.85 3.04 103.38 > 4.27 10.00 sdb 0.00 12.80 0.00 22.80 > 0.00 0.14 12.77 2.31 73.51 3.58 8.16 sdc > 0.00 12.80 0.00 21.40 0.00 0.13 12.86 2.38 > 79.21 3.63 7.76 sdd 0.00 12.80 0.00 21.80 > 0.00 0.14 12.70 2.66 90.02 3.93 8.56 sde > 0.00 2546.80 0.00 146.80 0.00 10.53 146.94 0.97 > 6.38 5.34 78.40 sdf 0.00 2546.80 0.00 146.60 > 0.00 10.53 147.05 0.97 6.38 5.53 81.04 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await svctm %util sda 0.00 > 231.40 0.00 566.80 0.00 3.16 11.41 124.92 228.26 > 1.76 99.52 sdb 0.00 223.00 0.00 558.00 > 0.00 3.06 11.23 46.64 83.55 1.70 94.88 sdc > 0.00 230.60 0.00 551.60 0.00 3.07 11.40 94.38 > 171.54 1.76 96.96 sdd 0.00 231.40 0.00 > 528.60 0.00 2.94 11.37 122.55 220.81 1.83 96.48 sde > 0.00 1495.80 0.00 99.00 0.00 6.23 128.86 0.81 > 8.15 7.76 76.80 sdf 0.00 1495.80 0.00 99.20 > 0.00 6.26 129.24 0.73 7.40 7.10 70.48 > > Top looks like: > > Cpu(s): 2.5%us, 1.9%sy, 0.0%ni, 71.9%id, 23.4%wa, 0.2%hi, > 0.2%si, 0.0%st Mem: 16474084k total, 15750384k used, 723700k > free, 1654320k buffers Swap: 2104440k total, 944k used, > 2103496k free, 13552720k cached > > It looks to me like we are maxing out the raid 10 array, and I > suspect the chunksize (4K) is the culprit. However as this is a > pest to change (!) I'd like some opinions on whether I'm jumping to > conclusions. I'd also appreciate comments about what chunksize to > use (I've tended to use 256K in the past, but what are folks > preferring these days?) > > regards > > Mark > > > Hello Mark, Okay, so, take all of this with a pinch of salt, but, I have the same config (pretty much) as you, with checkpoint_Segments raised to 192. The 'test' database server is Q8300, 8GB ram, 2 x 7200rpm SATA into motherboard which I then lvm stripped together; lvcreate -n data_lv -i 2 -I 64 mylv -L 60G (expandable under lvm2). That gives me a stripe size of 64. Running pgbench with the same scaling factors; starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 24 number of transactions per client: 12000 number of transactions actually processed: 288000/288000 tps = 1398.907206 (including connections establishing) tps = 1399.233785 (excluding connections establishing) It's also running ext4dev, but, this is the 'playground' server, not the real iron (And I dread to do that on the real iron). In short, I think that chunksize/stripesize is killing you. Personally, I would go for 64 or 128 .. that's jst my 2c .. feel free to ignore/scorn/laugh as applicable ;) Regards Stef -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknK0UsACgkQANG7uQ+9D9VK3wCeO/guLVb4K4V7VAQ29hJsmstb 2JMAmQEmJjNTQlxng/49D2/xHNw2W19/ =/rKD -----END PGP SIGNATURE----- -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance