Re: Performances issues with SSD volume ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> From: Thomas SIMON <tsimon@xxxxxxxxxxx>
> To: glynastill@xxxxxxxxxxx
> Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>
> Sent: Wednesday, 20 May 2015, 16:41
> Subject: Re:  Performances issues with SSD volume ?
> 
> Hi Glyn,
> 
> I'll try to answer this points.
> 
> I've made some benchs, and indeed 3.2 not helping. not helping at all.
> I changed to 3.14 and gap is quite big !
> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same 
> conditions
> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same 
> conditions too.

> 


That's a start then.

> It should so be better, but when server was in production, and ever with 
> bad kernel, performances was already quite good before they quickly 
> decreased.
> So i think too I have another configuration problem.
> 
> You say you're IO bound, so some output from sar / iostat / dstat and 
> pg_stat_activity etc before and during the issue would be of use.
> 
> -> My server is not in production right now, so it is difficult to 
> replay production load and have some useful metrics.
> The best way I've found is to replay trafic from logs with pgreplay.
> I hoped that the server falls back by replaying this traffic, but it 
> never happens ... Another thing I can't understand ...
> 
> Below is my dstat output when I replay this traffic (and so when server 
> runs normally)
> I have unfortunately no more outputs when server's performances decreased.

> 

It's a shame we can't get any insight into activity on the server during the issues.
> 
> 
> Other things you asked
> 
>      System memory size : 256 Go
>      SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model 
> INTEL SSDSC2BB480G4
>      Raid controller : MegaRAID SAS 2208
>      Partition alignments and stripe sizes : see fdisk delow
>      Kernel options : the config file is here : 
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>      Filesystem used and mount options : ext4, see mtab below
>      IO Scheduler : noop [deadline] cfq for my ssd raid volume
>      Postgresql version and configuration : 9.3.5
> 
> max_connections=1800
> shared_buffers=8GB
> temp_buffers=32MB
> work_mem=100MB
> maintenance_work_mem=12GB
> bgwriter_lru_maxpages=200
> effective_io_concurrency=4
> wal_level=hot_standby
> wal_sync_method=fdatasync
> wal_writer_delay=2000ms
> commit_delay=1000
> checkpoint_segments=80
> checkpoint_timeout=15min
> checkpoint_completion_target=0.7
> archive_command='rsync ....'
> max_wal_senders=10
> wal_keep_segments=38600
> vacuum_defer_cleanup_age=100
> hot_standby = on
> max_standby_archive_delay = 5min
> max_standby_streaming_delay = 5min
> hot_standby_feedback = on
> random_page_cost = 1.0
> effective_cache_size = 240GB
> log_min_error_statement = warning
> log_min_duration_statement = 0
> log_checkpoints = on
> log_connections = on
> log_disconnections = on
> log_line_prefix = '%m|%u|%d|%c|'
> log_lock_waits = on
> log_statement = 'all'
> log_timezone = 'localtime'
> track_activities = on
> track_functions = pl
> track_activity_query_size = 8192
> autovacuum_max_workers = 5
> autovacuum_naptime = 30s
> autovacuum_vacuum_threshold = 40
> autovacuum_analyze_threshold = 20
> autovacuum_vacuum_scale_factor = 0.10
> autovacuum_analyze_scale_factor = 0.10
> autovacuum_vacuum_cost_delay = 5ms
> default_transaction_isolation = 'read committed'
> max_locks_per_transaction = 128
> 
> 
> 
>      Connection pool sizing (pgpool2)
> num_init_children = 1790
> max_pool = 1

> 


1800 is quite a lot of connections, and with max_pool=1 in pgpool you're effectively just using pgpool as a proxy (as I recall, my memory is a little fuzzy on pgpool now).  Unless your app is stateful in some way or has unique users for each of those 1800 connections you should lower the quantity of active connections.  A general starting point is usually cpu cores * 2, so you could up max_pool and divide num_init_children by the same amount.

Hard to say what you need to do without knowing what exactly you're doing though.  What's the nature of the app(s)?

> I also add megacli parameters :
> 
> Virtual Drive: 2 (Target Id: 2)
> Name                :datassd
> RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
> Size                : 893.25 GB
> Sector Size         : 512
> Is VD emulated      : Yes
> Mirror Data         : 893.25 GB
> State               : Optimal
> Strip Size          : 256 KB
> Number Of Drives per span:2
> Span Depth          : 2
> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write 
> Cache if Bad BBU
> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write 
> Cache if Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy   : Enabled
> Encryption Type     : None
> Bad Blocks Exist: No
> PI type: No PI
> 
> Is VD Cached: No

> 


Not using your raid controllers write cache then?  Not sure just how important that is with SSDs these days, but if you've got a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that.


> 
> Other outputs :
>      fdisk -l
> 
> Disk /dev/sdc: 959.1 GB, 959119884288 bytes
> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x00000000
> 
> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x00000000
> 
> 
>      cat /etc/mtab
> /dev/mapper/vg_datassd-lv_datassd /datassd ext4 
> rw,relatime,discard,nobarrier,data=ordered 0 0
> (I added nobarrier option)
> 
> 
>      cat /sys/block/sdc/queue/scheduler
> noop [deadline] cfq
> 

> 


You could swap relatime for noatime,nodiratime.

>      sysctl kernel | grep sched
> kernel.sched_child_runs_first = 0
> kernel.sched_rr_timeslice_ms = 25
> kernel.sched_rt_period_us = 1000000
> kernel.sched_rt_runtime_us = 950000
> 
> I've read some advices about changing kernel.sched_autogroup_enabled=0 
> and kernel.sched_migration_cost_ns=5000000, but this parameters are not 
> recognized by my kernel. So I don't know what to do with that...

> 

sched_migration_cost_ns would be called sched_migration_cost in your old 3.2 kernel, not sure why 
sched_autogroup_enabled wouldn't be recognized though.


>      sysctl vm
> vm.admin_reserve_kbytes = 8192
> vm.block_dump = 0
> vm.dirty_background_bytes = 8388608
> vm.dirty_background_ratio = 0
> vm.dirty_bytes = 67108864
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 0
> vm.dirty_writeback_centisecs = 500
> vm.drop_caches = 3
> vm.laptop_mode = 0
> vm.legacy_va_layout = 0
> vm.lowmem_reserve_ratio = 256    256    32
> vm.max_map_count = 65530
> vm.memory_failure_early_kill = 0
> vm.memory_failure_recovery = 1
> vm.min_free_kbytes = 65008
> vm.min_slab_ratio = 5
> vm.min_unmapped_ratio = 1
> vm.mmap_min_addr = 4096
> vm.nr_pdflush_threads = 0
> vm.numa_zonelist_order = default
> vm.oom_dump_tasks = 1
> vm.oom_kill_allocating_task = 0
> vm.overcommit_kbytes = 0
> vm.overcommit_memory = 2
> vm.overcommit_ratio = 50
> vm.page-cluster = 3
> vm.panic_on_oom = 0
> vm.percpu_pagelist_fraction = 0
> vm.scan_unevictable_pages = 0
> vm.stat_interval = 1
> vm.swappiness = 60
> vm.user_reserve_kbytes = 131072
> vm.vfs_cache_pressure = 100
> vm.zone_reclaim_mode = 0
> 
> 
>      select * from pg_stat_activity
> I've got hundred of entries for that when i'm in production, and I 
> can't 
> paste it here due to confidentiality.
> By day, it is around 50 millions queries usually. (35% selects ; 55% 
> updates & 5% inserts)
> 
> 
>      lspci | grep -E 'RAID|SCSI|IDE|SATA'
> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 
> 6-Port SATA AHCI Controller (rev 06)
> 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 
> [Thunderbolt] (rev 05)
> 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 
> 4-Port SATA Storage Control Unit (rev 06)
> 
> Thanks
> 
> Thomas
> 
> 
> Le 18/05/2015 16:29, Glyn Astill a écrit :
>>>  From: Koray Eyidoğan <korayey@xxxxxxxxx>
>>>  To: Thomas SIMON <tsimon@xxxxxxxxxxx>
>>>  Cc: pgsql-admin@xxxxxxxxxxxxxx
>>>  Sent: Monday, 18 May 2015, 14:51
>>>  Subject: Re:  Performances issues with SSD volume ?
>>> 
>>> 
>>> 
>>>  Hi Thomas,
>>> 
>>> 
>>>  3.2 kernel may be #1 cause of your I/O load problem:
>>> 
>>> 
>>> 
> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
>>> 
>>> 
> https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4
>>> 
>>> 
>>> 
>>>  Have a nice day.
>>> 
>>> 
>>>  Koray
>> 
>>  Likely 3.2 kernel isn't helping, but I think we need much more 
> information before jumping to conclusions.
>> 
>>  You say you're IO bound, so some output from sar / iostat / dstat and 
> pg_stat_activity etc before and during the issue would be of use.
>> 
>>  Also:
>>       System memory size
>> 
>>       SSD Model numbers and how many
>>       Raid controller
>>       Partition allignments and stripe sizes
>>       Kernel options
>>       Filesystem used and mount options
>>       IO Scheduler
>>       Postgresql version and configuration
>>       Connection pool sizing
>> 
>>   
>>  Perhaps you could thow us the output of some of these:
>> 
>>       fdisk -l
>>       cat /etc/mtab
>>       cat /sys/block/<ssd device>/queue/scheduler
>>       sysctl kernel | grep sched
>>       sysctl vm
>>       select * from pg_stat_activity
>>       select name, setting from pg_settings
>>       lspci | grep -E 'RAID|SCSI|IDE|SATA'
> 
> 
> 
> -- 
> Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux