Do I have a hardware or a software problem?

Niels Kristian Schjødt <nielskristian@xxxxxxxxxxxxx> · Mon, 10 Dec 2012 23:51:58 +0100

#### Pitch ######################################################################################
I previously posted this question http://archives.postgresql.org/pgsql-performance/2012-11/msg00289.php about a performance issue with an update query. 
The question evolved into a more general discussion about my setup, and about a lot of I/O wait that I was encountering. Since then, I have gotten a whole lot more familiar with measuring things, and now I "just" need some experienced eyes to judge which direction I should go in - do I have a hardware issue, or a software issue - and what action should I take?
#####  My setup #############################################################################
The use case:
At night time we are doing a LOT of data maintenance, and hence the load on the database is very different from the day time. However we would like to be able to do some of it in the daytime, it's simply just too "heavy" on the database as is right now. The stats shown below is from one of those "heavy" load times.

Hardware: 
  - 32Gb ram 
  - 8 core Xeon E3-1245 processor
  - Two SEAGATE ST33000650NS drives (called sdc and sdd in the stats) in a softeware RAID1 array (called md2 in the stats)
  - Two INTEL SSDSC2CW240A3 SSD drives (called sda and sdb in the stats) in a software RAID1 (called md3 in the stats)

Software:
Postgres 9.2 running on 64bit ubuntu 12.04 with kernel 3.2

Configuration:
# postgresql.conf (a shortlist of everything changed from the default)
data_directory = '/var/lib/postgresql/9.2/main'
hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.2-main.pid'
listen_addresses = '192.168.0.2, localhost'
port = 5432
max_connections = 300
unix_socket_directory = '/var/run/postgresql'
wal_level = hot_standby
synchronous_commit = off
archive_mode = on
archive_command = 'rsync -a %p postgres@192.168.0.4:/var/lib/postgresql/9.2/wals/%f </dev/null'
max_wal_senders = 1
wal_keep_segments = 32
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
datestyle = 'iso, mdy'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
default_statistics_target = 100
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 160MB
wal_buffers = 4MB
checkpoint_segments = 100
shared_buffers = 4GB
checkpoint_timeout = 10min

The kernel has bee tweaked like so:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
kernel.shmmax = 8589934592
kernel.shmall = 17179869184

The pg_xlog folder has been moved onto the SSD array (md3), and symlinked back into the postgres dir.

##### The stats ###############################################################
These are the typical observations/stats I see in one of these periods:

1)
At top level this is what I see in new relic:
https://rpm.newrelic.com/public/charts/6ewGRle6bmc

2)
When the database is loaded like this, I see a lot of queries talking up to 1000 times as long, as they would when the database is not loaded so heavily.

3)
sudo iostat -dmx (typical usage)
Linux 3.2.0-33-generic (master-db) 	12/10/2012 	_x86_64_	(8 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.52    3.59   26.61     0.22     0.74    65.49     0.01    0.40    0.77    0.35   0.14   0.43
sdb               0.00     8.31    0.03   28.38     0.00     0.97    69.63     0.01    0.52    0.27    0.52   0.15   0.43
sdc               1.71    46.01   34.83  116.62     0.56     4.06    62.47     1.90   12.57   21.81    9.81   1.89  28.66
sdd               1.67    46.14   34.89  116.49     0.56     4.06    62.46     1.58   10.43   21.66    7.07   1.89  28.60
md1               0.00     0.00    0.00    0.00     0.00     0.00     2.69     0.00    0.00    0.00    0.00   0.00   0.00
md0               0.00     0.00    0.11    0.24     0.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
md2               0.00     0.00   72.99  161.95     1.11     4.06    45.10     0.00    0.00    0.00    0.00   0.00   0.00
md3               0.00     0.00    0.05   32.32     0.00     0.74    47.00     0.00    0.00    0.00    0.00   0.00   0.00

3)
sudo iotop -oa (running for about a minute or so)
TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  292    be/4 root               0.00 B      0.00 B    0.00 % 99.33 % [md2_raid1]
 2815  be/4 postgres     19.51 M     25.90 M  0.00 % 45.49 % postgres: autovacuum worker process   production
32553 be/4 postgres     45.74 M      9.38 M  0.00 % 37.89 % postgres: user production 192.168.0.3(58866) UPDATE
32570 be/4 postgres      6.91 M     35.02 M  0.00 % 16.71 % postgres: user production 192.168.0.3(35547) idle
32575 be/4 postgres      4.06 M     43.90 M  0.00 % 16.62 % postgres: user production 192.168.0.3(35561) SELECT
31673 be/4 postgres      4.14 M     52.16 M  0.00 % 16.24 % postgres: user production 192.168.0.3(39112) idle
32566 be/4 postgres      4.73 M     44.95 M  0.00 % 15.66 % postgres: user production 192.168.0.3(35531) idle
32568 be/4 postgres      4.50 M     33.84 M  0.00 % 14.62 % postgres: user production 192.168.0.3(35543) SELECT
32573 be/4 postgres      3.20 M     34.44 M  0.00 % 13.98 % postgres: user production 192.168.0.3(35559) idle
31590 be/4 postgres      3.23 M     29.72 M  0.00 % 13.90 % postgres: user production 192.168.0.3(50690) idle in transaction
32577 be/4 postgres      5.09 M     25.54 M  0.00 % 13.63 % postgres: user production 192.168.0.3(35563) idle
32565 be/4 postgres      2.06 M     35.93 M  0.00 % 13.41 % postgres: user production 192.168.0.3(35529) SELECT
32546 be/4 postgres      4.48 M     36.49 M  0.00 % 13.39 % postgres: user production 192.168.0.3(56927) UPDATE waiting
32569 be/4 postgres      3.50 M     26.75 M  0.00 % 12.82 % postgres: user production 192.168.0.3(35545) INSERT
31671 be/4 postgres      4.58 M     24.45 M  0.00 % 12.76 % postgres: user production 192.168.0.3(34841) idle in transaction
32551 be/4 postgres      3.26 M     31.77 M  0.00 % 12.06 % postgres: user production 192.168.0.3(58864) idle in transaction
32574 be/4 postgres      5.32 M     32.92 M  0.00 % 11.70 % postgres: user production 192.168.0.3(35560) idle
32572 be/4 postgres      3.00 M     32.66 M  0.00 % 11.66 % postgres: user production 192.168.0.3(35558) UPDATE
32560 be/4 postgres      5.12 M     25.89 M  0.00 % 11.52 % postgres: user production 192.168.0.3(33886) SELECT
32567 be/4 postgres      4.66 M     36.47 M  0.00 % 11.44 % postgres: user production 192.168.0.3(35534) SELECT
32571 be/4 postgres      2.86 M     31.27 M  0.00 % 11.31 % postgres: user production 192.168.0.3(35557) SELECT
32552 be/4 postgres      4.38 M     28.75 M  0.00 % 11.09 % postgres: user production 192.168.0.3(58865) idle in transaction
32554 be/4 postgres      3.69 M     30.21 M  0.00 % 10.90 % postgres: user production 192.168.0.3(58870) UPDATE
  339    be/3 root               0.00 B       2.29 M  0.00 %  9.81 % [jbd2/md2-8]
32576 be/4 postgres      3.37 M     19.91 M  0.00 %  9.73 % postgres: user production 192.168.0.3(35562) idle
32555 be/4 postgres      3.09 M     31.96 M  0.00 %  9.02 % postgres: user production 192.168.0.3(58875) SELECT
27548 be/4 postgres      0.00 B     97.12 M  0.00 %  7.41 % postgres: writer process
31445 be/4 postgres    924.00 K     27.35 M  0.00 %  7.11 % postgres: user production 192.168.0.1(34536) idle
31443 be/4 postgres      2.54 M      4.56 M  0.00 %  6.32 % postgres: user production 192.168.0.1(34508) idle
31459 be/4 postgres   1480.00 K     21.36 M  0.00 %  5.63 % postgres: user production 192.168.0.1(34543) idle
 1801 be/4 postgres   1896.00 K     10.89 M  0.00 %  5.57 % postgres: user production 192.168.0.3(34177) idle
32763 be/4 postgres   1696.00 K      6.95 M  0.00 %  5.33 % postgres: user production 192.168.0.3(57984) SELECT
 1800 be/4 postgres      2.46 M      5.13 M  0.00 %  5.24 % postgres: user production 192.168.0.3(34175) SELECT
 1803 be/4 postgres   1816.00 K      9.09 M  0.00 %  5.16 % postgres: user production 192.168.0.3(34206) idle
32578 be/4 postgres      2.57 M     11.62 M  0.00 %  5.06 % postgres: user production 192.168.0.3(35564) SELECT
31440 be/4 postgres      3.02 M      4.04 M  0.00 %  4.65 % postgres: user production 192.168.0.1(34463) idle
32605 be/4 postgres   1844.00 K     11.82 M  0.00 %  4.49 % postgres: user production 192.168.0.3(40399) idle
27547 be/4 postgres      0.00 B      0.00 B  0.00 %  3.93 % postgres: checkpointer process
31356 be/4 postgres   1368.00 K      3.27 M  0.00 %  3.93 % postgres: user production 192.168.0.1(34450) idle
32542 be/4 postgres   1180.00 K      6.05 M  0.00 %  3.90 % postgres: user production 192.168.0.3(56859) idle
32523 be/4 postgres   1088.00 K      4.33 M  0.00 %  3.59 % postgres: user production 192.168.0.3(48164) idle
32606 be/4 postgres   1964.00 K      6.94 M  0.00 %  3.51 % postgres: user production 192.168.0.3(40426) SELECT
31466 be/4 postgres   1596.00 K      3.11 M  0.00 %  3.47 % postgres: user production 192.168.0.1(34550) idle
32544 be/4 postgres   1184.00 K      4.25 M  0.00 %  3.38 % postgres: user production 192.168.0.3(56861) idle
31458 be/4 postgres   1088.00 K   1528.00 K  0.00 %  3.33 % postgres: user production 192.168.0.1(34541) idle
31444 be/4 postgres    884.00 K      4.23 M  0.00 %  3.27 % postgres: user production 192.168.0.1(34510) idle
32522 be/4 postgres    408.00 K      2.98 M  0.00 %  3.27 % postgres: user production 192.168.0.5(38361) idle
32762 be/4 postgres   1156.00 K      5.28 M  0.00 %  3.20 % postgres: user production 192.168.0.3(57962) idle
32582 be/4 postgres   1084.00 K      3.38 M  0.00 %  2.86 % postgres: user production 192.168.0.5(43104) idle
31353 be/4 postgres      2.04 M      3.02 M  0.00 %  2.82 % postgres: user production 192.168.0.1(34444) idle
31441 be/4 postgres    700.00 K      2.68 M  0.00 %  2.64 % postgres: user production 192.168.0.1(34465) idle
31462 be/4 postgres    980.00 K      3.50 M  0.00 %  2.57 % postgres: user production 192.168.0.1(34547) idle
32709 be/4 postgres    428.00 K      3.23 M  0.00 %  2.56 % postgres: user production 192.168.0.5(34323) idle
  685 be/4 postgres    748.00 K      3.59 M  0.00 %  2.41 % postgres: user production 192.168.0.3(34911) idle
  683 be/4 postgres    728.00 K      3.19 M  0.00 %  2.38 % postgres: user production 192.168.0.3(34868) idle
32765 be/4 postgres    464.00 K      3.76 M  0.00 %  2.21 % postgres: user production 192.168.0.3(58074) idle
32760 be/4 postgres    808.00 K      6.18 M  0.00 %  2.16 % postgres: user production 192.168.0.3(57958) idle
 1912 be/4 postgres    372.00 K      3.03 M  0.00 %  2.16 % postgres: user production 192.168.0.5(33743) idle
31446 be/4 postgres   1004.00 K      2.09 M  0.00 %  2.16 % postgres: user production 192.168.0.1(34539) idle
31460 be/4 postgres    584.00 K      2.74 M  0.00 %  2.10 % postgres: user production 192.168.0.1(34545) idle

5) vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  1 573424 321080  27124 28504352    0    0   143   618    0    4  2  0 91  7
 0  1 573424 320764  27124 28504496    0    0   104 15654 3788 4961  1  0 85 14
 0  1 573424 320684  27124 28504616    0    0   276 12736 4099 5374  0  1 84 15
 0  1 573424 319672  27124 28504900    0    0    80  7746 3624 4949  2  0 82 16
 0  1 573424 319180  27124 28504972    0    0    36 12489 3653 4761  2  0 86 12
 0  1 573424 318184  27132 28505000    0    0     8 10482 3413 4898  0  0 87 13
 0  1 573424 318424  27132 28505016    0    0     0  9564 2746 4290  0  0 87 13
 0  1 573424 318308  27168 28505016    0    0    36 10562 1895 2149  0  0 87 12
 0  3 573424 318208  27168 28505020    0    0    84 18529 3035 3265  1  0 85 14
 0  1 573424 318732  27176 28505080    0    0    84 14574 2986 3231  0  0 84 16
 0  2 573424 317588  27176 28505184    0    0     4  6681 1991 2207  2  1 86 12
 0  1 573424 316852  27176 28505260    0    0    76  7670 2910 3996  2  1 85 13
 0  1 573424 316632  27184 28505256    0    0     0  7186 2661 3740  0  0 87 12
 0  1 573424 316720  27188 28505260    0    0     0  2590 1731 2474  0  0 88 12
 0  1 573424 314252  27192 28505696    0    0   460 11612 1757 2431  0  0 82 18
 0  2 573424 313504  27192 28505724    0    0     0 19656 1775 2099  0  0 83 17
 0  3 573424 313300  27196 28505780    0    0   188  6237 2746 3193  2  0 80 17
 0  2 573424 312736  27200 28506348    0    0   804 18466 5014 6430  2  1 75 23
 2 35 573424 307564  27200 28509920    0    0  3912 16280 14377 15470 14  3 28 56
 0  5 573424 282848  27208 28533964    0    0  7484 27580 22017 25938 17  3 17 63
 1  5 573424 221100  27208 28563360    0    0  2852  3120 19639 28664 12  5 52 31
 0  4 573428 229912  26704 28519184    0    4  1208  5890 13976 20851 13  3 56 28
 0  2 573448 234680  26672 28513632    0   20     0 17204 1694 2636  0  0 71 28
 3  7 573452 220836  26644 28525548    0    4  1540 36370 27928 36551 17  5 50 27
 1  3 573488 234380  26556 28517416    0   36   584 19066 8275 9467  3  2 60 36
 0  1 573488 234496  26556 28517852    0    0    56 47429 3290 4310  0  0 79 20

6) sudo lsof - a hell of a lot of output, I can post it if anyone is interested :-)

#### Notes and thoughts  ##############################################################################

As you can see, even though I have moved the pg_xlog folder to the SSD array (md3) the by far largest amount of writes still goes to the regular HDD's (md2), which puzzles me - what can that be?
From stat 3) (the iostat) I notice that the SSD's doesn't seem to be something near fully utilized - maybe something else than just pg_xlog could be moved her? 
I have no idea if the amount of reads/writes is  within the acceptable/capable for my kind of hardware, or if it is far beyond?
In stat 3) (the iotop) it says that the RAID array (md2) is the most "waiting" part, does that taste like a root cause, or more like a symptom of some other bottleneck?

Thanks, for taking the time to look at by data! :-)