Re: Performances issues with SSD volume ?

Thomas SIMON <tsimon@xxxxxxxxxxx> · Thu, 21 May 2015 14:28:09 +0200

disk was already in noop mode

cat /sys/block/sdc/queue/scheduler
 noop [deadline] cfq

Thomas

Le 20/05/2015 19:03, Glyn Astill a écrit :
----- Original Message -----
From: Glyn Astill <glynastill@xxxxxxxxxxx>
To: Thomas SIMON <tsimon@xxxxxxxxxxx>
Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>
Sent: Wednesday, 20 May 2015, 17:50
Subject: Re:  Performances issues with SSD volume ?

  From: Thomas SIMON <tsimon@xxxxxxxxxxx>
  To: glynastill@xxxxxxxxxxx
  Cc: "pgsql-admin@xxxxxxxxxxxxxx"
<pgsql-admin@xxxxxxxxxxxxxx>
  Sent: Wednesday, 20 May 2015, 16:41
  Subject: Re:  Performances issues with SSD volume ?

  Hi Glyn,

  I'll try to answer this points.

  I've made some benchs, and indeed 3.2 not helping. not helping at all.
  I changed to 3.14 and gap is quite big !
  With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
  conditions
  With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
  conditions too.

That's a start then.

  It should so be better, but when server was in production, and ever with
  bad kernel, performances was already quite good before they quickly
  decreased.
  So i think too I have another configuration problem.

  You say you're IO bound, so some output from sar / iostat / dstat and
  pg_stat_activity etc before and during the issue would be of use.

  -> My server is not in production right now, so it is difficult to
  replay production load and have some useful metrics.
  The best way I've found is to replay trafic from logs with pgreplay.
  I hoped that the server falls back by replaying this traffic, but it
  never happens ... Another thing I can't understand ...

  Below is my dstat output when I replay this traffic (and so when server
  runs normally)
  I have unfortunately no more outputs when server's performances
decreased.

It's a shame we can't get any insight into activity on the server during
the issues.

  Other things you asked

       System memory size : 256 Go
       SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
  INTEL SSDSC2BB480G4
       Raid controller : MegaRAID SAS 2208
       Partition alignments and stripe sizes : see fdisk delow
       Kernel options : the config file is here :

ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
       Filesystem used and mount options : ext4, see mtab below
       IO Scheduler : noop [deadline] cfq for my ssd raid volume
       Postgresql version and configuration : 9.3.5

  max_connections=1800
  shared_buffers=8GB
  temp_buffers=32MB
  work_mem=100MB
  maintenance_work_mem=12GB
  bgwriter_lru_maxpages=200
  effective_io_concurrency=4
  wal_level=hot_standby
  wal_sync_method=fdatasync
  wal_writer_delay=2000ms
  commit_delay=1000
  checkpoint_segments=80
  checkpoint_timeout=15min
  checkpoint_completion_target=0.7
  archive_command='rsync ....'
  max_wal_senders=10
  wal_keep_segments=38600
  vacuum_defer_cleanup_age=100
  hot_standby = on
  max_standby_archive_delay = 5min
  max_standby_streaming_delay = 5min
  hot_standby_feedback = on
  random_page_cost = 1.0
  effective_cache_size = 240GB
  log_min_error_statement = warning
  log_min_duration_statement = 0
  log_checkpoints = on
  log_connections = on
  log_disconnections = on
  log_line_prefix = '%m|%u|%d|%c|'
  log_lock_waits = on
  log_statement = 'all'
  log_timezone = 'localtime'
  track_activities = on
  track_functions = pl
  track_activity_query_size = 8192
  autovacuum_max_workers = 5
  autovacuum_naptime = 30s
  autovacuum_vacuum_threshold = 40
  autovacuum_analyze_threshold = 20
  autovacuum_vacuum_scale_factor = 0.10
  autovacuum_analyze_scale_factor = 0.10
  autovacuum_vacuum_cost_delay = 5ms
  default_transaction_isolation = 'read committed'
  max_locks_per_transaction = 128

       Connection pool sizing (pgpool2)
  num_init_children = 1790
  max_pool = 1

1800 is quite a lot of connections, and with max_pool=1 in pgpool you're
effectively just using pgpool as a proxy (as I recall, my memory is a little
fuzzy on pgpool now).  Unless your app is stateful in some way or has unique
users for each of those 1800 connections you should lower the quantity of active
connections.  A general starting point is usually cpu cores * 2, so you could up
max_pool and divide num_init_children by the same amount.

Hard to say what you need to do without knowing what exactly you're doing
though.  What's the nature of the app(s)?

  I also add megacli parameters :

  Virtual Drive: 2 (Target Id: 2)
  Name                :datassd
  RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
  Size                : 893.25 GB
  Sector Size         : 512
  Is VD emulated      : Yes
  Mirror Data         : 893.25 GB
  State               : Optimal
  Strip Size          : 256 KB
  Number Of Drives per span:2
  Span Depth          : 2
  Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
  Cache if Bad BBU
  Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
  Cache if Bad BBU
  Default Access Policy: Read/Write
  Current Access Policy: Read/Write
  Disk Cache Policy   : Enabled
  Encryption Type     : None
  Bad Blocks Exist: No
  PI type: No PI

  Is VD Cached: No

Not using your raid controllers write cache then?  Not sure just how important
that is with SSDs these days, but if you've got a BBU set it to
"WriteBack". Also change "Cache if Bad BBU" to "No
Write Cache if Bad BBU" if you do that.

  Other outputs :
       fdisk -l

  Disk /dev/sdc: 959.1 GB, 959119884288 bytes
  255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
  Units = sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 4096 bytes
  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
  Disk identifier: 0x00000000

  Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
  255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
  Units = sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 4096 bytes
  I/O size (minimum/optimal): 4096 bytes / 4096 bytes
  Disk identifier: 0x00000000

       cat /etc/mtab
  /dev/mapper/vg_datassd-lv_datassd /datassd ext4
  rw,relatime,discard,nobarrier,data=ordered 0 0
  (I added nobarrier option)

       cat /sys/block/sdc/queue/scheduler
  noop [deadline] cfq

You could swap relatime for noatime,nodiratime.

You could also see if the noop scheduler makes any improvement.

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin