Re: Performances issues with SSD volume ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----

> From: Thomas SIMON <tsimon@xxxxxxxxxxx>
> To: Glyn Astill <glynastill@xxxxxxxxxxx>
> Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>
> Sent: Friday, 22 May 2015, 16:29
> Subject: Re:  Performances issues with SSD volume ?
> 
> 
> Thomas
> 
> Le 22/05/2015 11:37, Glyn Astill a écrit :
>>>  From: Thomas SIMON <tsimon@xxxxxxxxxxx>
>>>  To: Glyn Astill <glynastill@xxxxxxxxxxx>
>>>  Cc: "pgsql-admin@xxxxxxxxxxxxxx" 
> <pgsql-admin@xxxxxxxxxxxxxx>
>>>  Sent: Thursday, 21 May 2015, 17:56
>>>  Subject: Re:  Performances issues with SSD volume ?
>>> 
>>>  Le 21/05/2015 16:30, Glyn Astill a écrit :
>>>>>>      I think at this point you could do with going back and 
> trying to
>>>  reproduce
>>>>>>    the issue, then trace back up to pg_stat_activity to see 
> what
>>>  activity could be
>>>>>>    causing the disk i/o.  I assume you've tried to 
> reproduce the
>>>  disk issues
>>>>>>    with a simple disk benchmark like bonnie++?
>>>>>    Yes, I think the same thing. Probably I will doing this 
> tomorrow early
>>>>>    in the morning.
>>>>>    I tried to reproduce disk issues with different stress tests 
> like
>>>>>    bonnie, fio, tsung, and I use a more realistic scenario with 
> pgreplay
>>>  to
>>>>>    reproduce my production trafic from postgresql logfile.
>>>>>    However, I'm note sure how to diagnostic performance 
> issues.
>>>>>    I mean, if I see ssd are 100% full, how can I figure out why 
> their
>>>>>    behavior changes ?
>>>>> 
>>>>    Well the disk benchmarks are purely to see what your disks are 
> capable of,
>>>  and help with your initial tuning.
>>>> 
>>>>    You need to trace back which processes are causing most of the IO
>>>  you're seeing, as well as the postgresql logs something like iotop, 
> or dstat
>>>  with the --top-bio option might help you there.
>>>> 
>>>>    You could also look at the pg_statio_user_tables view to narrow 
> down which
>>>  tables are being hit the hardest, which might give you some clues.
>>>  Is there something to activate for seeing something in this table ?
>>>  Because its empty on my production server
>>> 
>>>  postgres=# select * from pg_statio_user_tables;
>>>     relid | schemaname | relname | heap_blks_read | heap_blks_hit |
>>>  idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
>>>  tidx_blks_read | tidx_blks_hit
>>> 
> -------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
>>>  (0 rows)
>>> 
>> 
>>  Looks like you need to set track_counts=on then.  Infact if you've got 
> track_counts off then you're also not running autovacuum, that's a 
> warning flag unless it's intentional.
> Arf, no it's just me who does dummy things ...
> I was in postgres database, not in my app...
> 
> 
> Here is result of my top tables.
> 
> relid   | schemaname | relname                        | heap_blks_read | 
> heap_blks_hit | idx_blks_read | idx_blks_hit  | toast_blks_read | 
> toast_blks_hit | tidx_blks_read | tidx_blks_hit
> ----------+------------+------------------------------------------------------+----------------+---------------+---------------+---------------+-----------------+----------------+----------------+---------------
>    2056411 | public     | table 1                                      
> |   326588572770 | 432317896616 |   12839634494 |   33230958888 |        
> 52298815 |       20020628 |        9406665 |     153557966
>    2059324 | public     | table 2                               | 
> 223323685944 |   72194867978 |   10319078550 |  103672304590 |          
> 189135 |         262100 |          27815 |        207212
>    2056368 | public     | table 3 
>                                                 |   198706578066 | 
> 1259514828344 |   21083719219 | 1456776226532 |           90446 
> |         331680 |          30850 |        367441
>    2056347 | public     | table 4 |   124086575669 |  554775474730 |    
> 5236764894 |  614913194098 |        14978587 |       67085822 |        
> 751951 |     134187396
>    2057296 | public     | table 5                                        
> |    68120702371 | 18134568265 |    1243107028 |   10537765277 |      
> 350646003 |       87888306 |      119113944 |     629790819
>    2058460 | public     | table 6 |    35119995405 |  158177258403 |    
> 1687014495 | 1395923096049 |          738542 |         214574 |        
> 325765 |       1318187
>    2058900 | public     | table 7                                      
> |    34205345184 | 271135546976 |   21050227933 |  252099340046 
> |            3924 |           6567 |            422 |         16333
>    2059195 | public     | table 8 
>                                               |    29169256266 | 
> 152937546457 |    2263998316 |  127782237565 |               0 
> |              0 |              0 |             0
> 
> how you interpret these values?
> You check ratio between  heap_read & heap_hit ?
> i've made ratio, it's about 89%

> 


It shows blocks fetched from disk, or possibly from OS the buffer cache as "read", and blocks already in shared buffers as "hit".  


See here: http://www.postgresql.org/docs/9.2/static/monitoring-stats.html#PG-STATIO-ALL-TABLES-VIEW


Also this thread appears to now be split over diagnosing the issue from two different standpoints; 1) Is the i/o performance (or some other aspect) of your new SSD server worse than that of your old live spinning disk system. 2) Are there improvements to be had with your database and application setup.

For 1) Maybe if you could post the output of bonnie++ results from the data volumes on both servers when nothing else is going off on them.  If you can't do that on your production server perhaps just the output from sar -d during a busy period might help.  Hopefully that would help to clarify the specific disk performance differences for us.  Other interesting tests would be the output of Greg Smiths stream scaling script (https://github.com/gregs1104/stream-scaling).  We're still missing the exact specs of your old system and the CPUs in the new one.


For 2) I think we're still missing a lot of information about what's happening in your database to be able to say much, i.e. what's the size of the database, tables 1 - 8, typical read / write paterns, and transaction rate?  Perhaps you could give a simple description of what your apps(s) do on a daily basis, and then post a diff of the outputs of the pg_stat_database, pg_stat_user_tables and pg_staio_user_tables views over a specific period of time.

You really need to dig into each of these yourself to try and see where your new system might be lacking.


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux