Re: Performances issues with SSD volume ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
> From: Glyn Astill <glynastill@xxxxxxxxxxx>
> To: Thomas SIMON <tsimon@xxxxxxxxxxx>
> Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>
> Sent: Wednesday, 20 May 2015, 17:50
> Subject: Re:  Performances issues with SSD volume ?
> 
> 
> 
>>  From: Thomas SIMON <tsimon@xxxxxxxxxxx>
>>  To: glynastill@xxxxxxxxxxx
>>  Cc: "pgsql-admin@xxxxxxxxxxxxxx" 
> <pgsql-admin@xxxxxxxxxxxxxx>
>>  Sent: Wednesday, 20 May 2015, 16:41
>>  Subject: Re:  Performances issues with SSD volume ?
>> 
>>  Hi Glyn,
>> 
>>  I'll try to answer this points.
>> 
>>  I've made some benchs, and indeed 3.2 not helping. not helping at all.
>>  I changed to 3.14 and gap is quite big !
>>  With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same 
>>  conditions
>>  With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same 
>>  conditions too.
> 
>> 
> 
> 
> That's a start then.
> 
>>  It should so be better, but when server was in production, and ever with 
>>  bad kernel, performances was already quite good before they quickly 
>>  decreased.
>>  So i think too I have another configuration problem.
>> 
>>  You say you're IO bound, so some output from sar / iostat / dstat and 
>>  pg_stat_activity etc before and during the issue would be of use.
>> 
>>  -> My server is not in production right now, so it is difficult to 
>>  replay production load and have some useful metrics.
>>  The best way I've found is to replay trafic from logs with pgreplay.
>>  I hoped that the server falls back by replaying this traffic, but it 
>>  never happens ... Another thing I can't understand ...
>> 
>>  Below is my dstat output when I replay this traffic (and so when server 
>>  runs normally)
>>  I have unfortunately no more outputs when server's performances 
> decreased.
> 
>> 
> 
> It's a shame we can't get any insight into activity on the server during 
> the issues.
>> 
>> 
>>  Other things you asked
>> 
>>       System memory size : 256 Go
>>       SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model 
>>  INTEL SSDSC2BB480G4
>>       Raid controller : MegaRAID SAS 2208
>>       Partition alignments and stripe sizes : see fdisk delow
>>       Kernel options : the config file is here : 
>> 
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>       Filesystem used and mount options : ext4, see mtab below
>>       IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>       Postgresql version and configuration : 9.3.5
>> 
>>  max_connections=1800
>>  shared_buffers=8GB
>>  temp_buffers=32MB
>>  work_mem=100MB
>>  maintenance_work_mem=12GB
>>  bgwriter_lru_maxpages=200
>>  effective_io_concurrency=4
>>  wal_level=hot_standby
>>  wal_sync_method=fdatasync
>>  wal_writer_delay=2000ms
>>  commit_delay=1000
>>  checkpoint_segments=80
>>  checkpoint_timeout=15min
>>  checkpoint_completion_target=0.7
>>  archive_command='rsync ....'
>>  max_wal_senders=10
>>  wal_keep_segments=38600
>>  vacuum_defer_cleanup_age=100
>>  hot_standby = on
>>  max_standby_archive_delay = 5min
>>  max_standby_streaming_delay = 5min
>>  hot_standby_feedback = on
>>  random_page_cost = 1.0
>>  effective_cache_size = 240GB
>>  log_min_error_statement = warning
>>  log_min_duration_statement = 0
>>  log_checkpoints = on
>>  log_connections = on
>>  log_disconnections = on
>>  log_line_prefix = '%m|%u|%d|%c|'
>>  log_lock_waits = on
>>  log_statement = 'all'
>>  log_timezone = 'localtime'
>>  track_activities = on
>>  track_functions = pl
>>  track_activity_query_size = 8192
>>  autovacuum_max_workers = 5
>>  autovacuum_naptime = 30s
>>  autovacuum_vacuum_threshold = 40
>>  autovacuum_analyze_threshold = 20
>>  autovacuum_vacuum_scale_factor = 0.10
>>  autovacuum_analyze_scale_factor = 0.10
>>  autovacuum_vacuum_cost_delay = 5ms
>>  default_transaction_isolation = 'read committed'
>>  max_locks_per_transaction = 128
>> 
>> 
>> 
>>       Connection pool sizing (pgpool2)
>>  num_init_children = 1790
>>  max_pool = 1
> 
>> 
> 
> 
> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're 
> effectively just using pgpool as a proxy (as I recall, my memory is a little 
> fuzzy on pgpool now).  Unless your app is stateful in some way or has unique 
> users for each of those 1800 connections you should lower the quantity of active 
> connections.  A general starting point is usually cpu cores * 2, so you could up 
> max_pool and divide num_init_children by the same amount.
> 
> Hard to say what you need to do without knowing what exactly you're doing 
> though.  What's the nature of the app(s)?
> 
>>  I also add megacli parameters :
>> 
>>  Virtual Drive: 2 (Target Id: 2)
>>  Name                :datassd
>>  RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
>>  Size                : 893.25 GB
>>  Sector Size         : 512
>>  Is VD emulated      : Yes
>>  Mirror Data         : 893.25 GB
>>  State               : Optimal
>>  Strip Size          : 256 KB
>>  Number Of Drives per span:2
>>  Span Depth          : 2
>>  Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write 
>>  Cache if Bad BBU
>>  Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write 
>>  Cache if Bad BBU
>>  Default Access Policy: Read/Write
>>  Current Access Policy: Read/Write
>>  Disk Cache Policy   : Enabled
>>  Encryption Type     : None
>>  Bad Blocks Exist: No
>>  PI type: No PI
>> 
>>  Is VD Cached: No
> 
>> 
> 
> 
> Not using your raid controllers write cache then?  Not sure just how important 
> that is with SSDs these days, but if you've got a BBU set it to 
> "WriteBack". Also change "Cache if Bad BBU" to "No 
> Write Cache if Bad BBU" if you do that.
> 
> 
>> 
>>  Other outputs :
>>       fdisk -l
>> 
>>  Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>>  255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>>  Units = sectors of 1 * 512 = 512 bytes
>>  Sector size (logical/physical): 512 bytes / 4096 bytes
>>  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>  Disk identifier: 0x00000000
>> 
>>  Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>>  255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>>  Units = sectors of 1 * 512 = 512 bytes
>>  Sector size (logical/physical): 512 bytes / 4096 bytes
>>  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>  Disk identifier: 0x00000000
>> 
>> 
>>       cat /etc/mtab
>>  /dev/mapper/vg_datassd-lv_datassd /datassd ext4 
>>  rw,relatime,discard,nobarrier,data=ordered 0 0
>>  (I added nobarrier option)
>> 
>> 
>>       cat /sys/block/sdc/queue/scheduler
>>  noop [deadline] cfq
>> 
> 
>> 
> 
> 
> You could swap relatime for noatime,nodiratime.
> 


You could also see if the noop scheduler makes any improvement.


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux