Thomas
Le 22/05/2015 18:21, Glyn Astill a écrit :
----- Original Message -----
From: Thomas SIMON<tsimon@xxxxxxxxxxx>
To: Glyn Astill<glynastill@xxxxxxxxxxx>
Cc:"pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>
Sent: Friday, 22 May 2015, 16:29
Subject: Re: Performances issues with SSD volume ?
Thomas
Le 22/05/2015 11:37, Glyn Astill a écrit :
From: Thomas SIMON<tsimon@xxxxxxxxxxx>
To: Glyn Astill<glynastill@xxxxxxxxxxx>
Cc:"pgsql-admin@xxxxxxxxxxxxxx"
<pgsql-admin@xxxxxxxxxxxxxx>
Sent: Thursday, 21 May 2015, 17:56
Subject: Re: Performances issues with SSD volume ?
Le 21/05/2015 16:30, Glyn Astill a écrit :
I think at this point you could do with going back and
trying to
reproduce
the issue, then trace back up to pg_stat_activity to see
what
activity could be
causing the disk i/o. I assume you've tried to
reproduce the
disk issues
with a simple disk benchmark like bonnie++?
Yes, I think the same thing. Probably I will doing this
tomorrow early
in the morning.
I tried to reproduce disk issues with different stress tests
like
bonnie, fio, tsung, and I use a more realistic scenario with
pgreplay
to
reproduce my production trafic from postgresql logfile.
However, I'm note sure how to diagnostic performance
issues.
I mean, if I see ssd are 100% full, how can I figure out why
their
behavior changes ?
Well the disk benchmarks are purely to see what your disks are
capable of,
and help with your initial tuning.
You need to trace back which processes are causing most of the IO
you're seeing, as well as the postgresql logs something like iotop,
or dstat
with the --top-bio option might help you there.
You could also look at the pg_statio_user_tables view to narrow
down which
tables are being hit the hardest, which might give you some clues.
Is there something to activate for seeing something in this table ?
Because its empty on my production server
postgres=# select * from pg_statio_user_tables;
relid | schemaname | relname | heap_blks_read | heap_blks_hit |
idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
(0 rows)
Looks like you need to set track_counts=on then. Infact if you've got
track_counts off then you're also not running autovacuum, that's a
warning flag unless it's intentional.
Arf, no it's just me who does dummy things ...
I was in postgres database, not in my app...
Here is result of my top tables.
relid | schemaname | relname | heap_blks_read |
heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read |
toast_blks_hit | tidx_blks_read | tidx_blks_hit
----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
2056411 | public | table 1
| 326588572770 | 432317896616 | 12839634494 | 33230958888 |
52298815 | 20020628 | 9406665 | 153557966
2059324 | public | table 2 |
223323685944 | 72194867978 | 10319078550 | 103672304590 |
189135 | 262100 | 27815 | 207212
2056368 | public | table 3
| 198706578066 |
1259514828344 | 21083719219 | 1456776226532 | 90446
| 331680 | 30850 | 367441
2056347 | public | table 4 | 124086575669 | 554775474730 |
5236764894 | 614913194098 | 14978587 | 67085822 |
751951 | 134187396
2057296 | public | table 5
| 68120702371 | 18134568265 | 1243107028 | 10537765277 |
350646003 | 87888306 | 119113944 | 629790819
2058460 | public | table 6 | 35119995405 | 158177258403 |
1687014495 | 1395923096049 | 738542 | 214574 |
325765 | 1318187
2058900 | public | table 7
| 34205345184 | 271135546976 | 21050227933 | 252099340046
| 3924 | 6567 | 422 | 16333
2059195 | public | table 8
| 29169256266 |
152937546457 | 2263998316 | 127782237565 | 0
| 0 | 0 | 0
how you interpret these values?
You check ratio between heap_read & heap_hit ?
i've made ratio, it's about 89%
It shows blocks fetched from disk, or possibly from OS the buffer cache as "read", and blocks already in shared buffers as "hit".
See here:http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STATIO-ALL-TABLES-VIEW
Also this thread appears to now be split over diagnosing the issue from two different standpoints; 1) Is the i/o performance (or some other aspect) of your new SSD server worse than that of your old live spinning disk system. 2) Are there improvements to be had with your database and application setup.
For 1) Maybe if you could post the output of bonnie++ results from the data volumes on both servers when nothing else is going off on them. If you can't do that on your production server perhaps just the output from sar -d during a busy period might help. Hopefully that would help to clarify the specific disk performance differences for us. Other interesting tests would be the output of Greg Smiths stream scaling script (https://github.com/gregs1104/stream-scaling). We're still missing the exact specs of your old system and the CPUs in the new one.
I'll stay in this 1st point for now, because there is origin of my
problem and most urgent right now.
I can't do bonnie++ now, beacause I've already prepared server as a
slave, and I have no sufficient disk space available (program says File
size should be double RAM for good results output)
My current sar output on productions (HDD) server is:
14:25:01 DEV tps rd_sec/s wr_sec/s
avgrq-sz avgqu-sz await svctm %util
11:05:01 vg_data-lv_data 954.37 2417.35 18941.77 22.38
31.10 32.46 0.34 32.70
11:15:01 vg_data-lv_data 1155.79 8716.91 21995.77 26.57
25.15 21.74 0.40 46.70
11:25:01 vg_data-lv_data 1250.62 6478.67 23450.07 23.93
39.77 31.78 0.41 51.34
11:35:01 vg_data-lv_data 842.48 2051.11 17120.92 22.76
15.63 18.53 0.29 24.04
11:45:01 vg_data-lv_data 666.21 1403.32 14174.47 23.38
10.11 15.12 0.24 15.79
11:55:01 vg_data-lv_data 923.51 6763.36 15337.58 23.93
13.07 14.14 0.35 32.63
12:05:01 vg_data-lv_data 989.86 9148.71 16252.59 25.66
19.42 19.56 0.45 44.21
12:15:01 vg_data-lv_data 1369.24 8631.93 24737.60 24.37
35.04 25.54 0.45 61.33
12:25:01 vg_data-lv_data 1776.12 7070.01 39851.34 26.42
74.81 42.05 0.44 77.29
12:35:01 vg_data-lv_data 1529.15 6635.80 85865.14 60.49
54.11 35.34 0.48 72.89
12:45:01 vg_data-lv_data 1187.43 4528.74 40366.95 37.81
36.07 30.36 0.39 45.81
12:55:01 vg_data-lv_data 984.48 3520.06 21539.36 25.45
17.91 18.17 0.31 30.20
13:05:01 vg_data-lv_data 926.54 6304.44 16688.94 24.82
17.36 18.69 0.41 38.05
13:15:01 vg_data-lv_data 1232.46 7199.65 29852.49 30.06
40.17 32.53 0.42 51.60
13:25:01 vg_data-lv_data 1223.46 3945.05 27448.15 25.66
31.07 25.31 0.35 42.65
13:35:01 vg_data-lv_data 1126.91 2811.70 22067.19 22.08
24.33 21.55 0.32 36.00
13:45:01 vg_data-lv_data 833.33 1805.26 17274.43 22.90
24.40 29.25 0.30 25.41
13:55:02 vg_data-lv_data 1085.88 7616.75 19140.67 24.64
17.48 16.06 0.39 42.15
14:05:01 vg_data-lv_data 691.52 3852.50 13125.53 24.55
7.75 11.15 0.30 20.74
14:15:01 vg_data-lv_data 1288.88 5390.41 24171.07 22.94
33.31 25.76 0.36 46.31
14:25:01 vg_data-lv_data 1592.88 3637.77 29836.89 21.02
76.45 47.94 0.40 63.28
14:35:01 vg_data-lv_data 1652.78 9502.87 31587.68 24.86
58.97 35.58 0.44 72.46
14:45:01 vg_data-lv_data 1623.82 6249.52 34148.46 24.88
53.47 32.83 0.40 65.19
14:55:01 vg_data-lv_data 1330.44 6516.11 26828.59 25.06
55.66 41.81 0.42 55.46
Average: vg_data-lv_data 1176.55 5508.02 26324.37 27.06
33.86 28.72 0.39 45.59
Output of stream-scaling is
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 25600K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu1 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 3 Cache: 25600K (Unified)
... (for 40 cpus)
Total CPU system cache: 1060372480 bytes
Suggested minimum array elements needed: 481987490
WARNING: Array size may not fit into a 32 bit structure.
If stream files to compile, you may need to uncomment the
line in the script labeled and described by the "Size
clamp code" comments in the stream-scaling script.
Array elements used: 481987490
=== CPU Core Summary ===
processor : 39
model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
cpu MHz : 1273.925
siblings : 20
=== Check and build stream ===
/tmp/ccuVZwf5.o: In function `tuned_STREAM_Triad._omp_fn.0':
stream.c:(.text+0x5a): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x8b): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x98): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0xe0): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0xe7): relocation truncated to fit: R_X86_64_32S against
`.bss'
stream.c:(.text+0x14b): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x15b): relocation truncated to fit: R_X86_64_32S
against `.bss'
/tmp/ccuVZwf5.o: In function `tuned_STREAM_Add._omp_fn.1':
stream.c:(.text+0x1f4): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x1fd): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x23b): relocation truncated to fit: R_X86_64_32S
against `.bss'
stream.c:(.text+0x242): additional relocation overflows omitted from the
output
collect2: error: ld returned 1 exit status
=== Trying large memory model ===
(this can take a while to compile)
=== Testing up to 40 cores ===
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 481987490, Offset = 0
Total memory required = 11031.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 1
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 520522 microseconds.
(= 520522 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 9552.4291 0.8205 0.8073 0.8319
Scale: 9818.6724 0.8128 0.7854 0.8333
Add: 10600.6443 1.1113 1.0912 1.1557
Triad: 10598.8774 1.1073 1.0914 1.1301
-------------------------------------------------------------
Failed Validation on array c[]
Expected : 148233746338593734656.000000
Observed : 148233747862815801344.000000
-------------------------------------------------------------
Number of Threads requested = 2
Function Rate (MB/s) Avg time Min time Max time
Triad: 26541.4003 0.4667 0.4358 0.5130
Failed Validation on array c[]
Expected : 148233746338593734656.000000
Observed : 148233747862815801344.000000
Number of Threads requested = 3
Function Rate (MB/s) Avg time Min time Max time
Triad: 30890.3475 0.3798 0.3745 0.3842
Failed Validation on array c[]
Expected : 148233746338593734656.000000
Observed : 148233747862815801344.000000
...
Number of Threads requested = 40
Function Rate (MB/s) Avg time Min time Max time
Triad: 53770.7791 0.2186 0.2151 0.2258
Failed Validation on array c[]
Expected : 148233746338593734656.000000
Observed : 148233747862815801344.000000
For hardware specs, both servers have same CPU & RAM ; this is 40 *
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz ; and 256 GB RAM.
I'll do tomorrow morning switch between master & slave ; so I probably
will have more information at this moment on SSD server. (sar, dstat
reports, ..)
For 2) I think we're still missing a lot of information about what's happening in your database to be able to say much, i.e. what's the size of the database, tables 1 - 8, typical read / write paterns, and transaction rate? Perhaps you could give a simple description of what your apps(s) do on a daily basis, and then post a diff of the outputs of the pg_stat_database, pg_stat_user_tables and pg_staio_user_tables views over a specific period of time.
You really need to dig into each of these yourself to try and see where your new system might be lacking.
Quick answer on principal values on db
size ~ 150 GB.
Standard comportement is 55% update, 40% select & 10 % insert ; around
170 TPS.
I've set up recording of pg_stat_database for getting stats and will try
to analyze other tables.
--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin