----- Original Message ----- > From: Glyn Astill <glynastill@xxxxxxxxxxx> > To: Thomas SIMON <tsimon@xxxxxxxxxxx> > Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx> > Sent: Wednesday, 20 May 2015, 17:50 > Subject: Re: Performances issues with SSD volume ? > > > >> From: Thomas SIMON <tsimon@xxxxxxxxxxx> >> To: glynastill@xxxxxxxxxxx >> Cc: "pgsql-admin@xxxxxxxxxxxxxx" > <pgsql-admin@xxxxxxxxxxxxxx> >> Sent: Wednesday, 20 May 2015, 16:41 >> Subject: Re: Performances issues with SSD volume ? >> >> Hi Glyn, >> >> I'll try to answer this points. >> >> I've made some benchs, and indeed 3.2 not helping. not helping at all. >> I changed to 3.14 and gap is quite big ! >> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same >> conditions >> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same >> conditions too. > >> > > > That's a start then. > >> It should so be better, but when server was in production, and ever with >> bad kernel, performances was already quite good before they quickly >> decreased. >> So i think too I have another configuration problem. >> >> You say you're IO bound, so some output from sar / iostat / dstat and >> pg_stat_activity etc before and during the issue would be of use. >> >> -> My server is not in production right now, so it is difficult to >> replay production load and have some useful metrics. >> The best way I've found is to replay trafic from logs with pgreplay. >> I hoped that the server falls back by replaying this traffic, but it >> never happens ... Another thing I can't understand ... >> >> Below is my dstat output when I replay this traffic (and so when server >> runs normally) >> I have unfortunately no more outputs when server's performances > decreased. > >> > > It's a shame we can't get any insight into activity on the server during > the issues. >> >> >> Other things you asked >> >> System memory size : 256 Go >> SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model >> INTEL SSDSC2BB480G4 >> Raid controller : MegaRAID SAS 2208 >> Partition alignments and stripe sizes : see fdisk delow >> Kernel options : the config file is here : >> > ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64 >> Filesystem used and mount options : ext4, see mtab below >> IO Scheduler : noop [deadline] cfq for my ssd raid volume >> Postgresql version and configuration : 9.3.5 >> >> max_connections=1800 >> shared_buffers=8GB >> temp_buffers=32MB >> work_mem=100MB >> maintenance_work_mem=12GB >> bgwriter_lru_maxpages=200 >> effective_io_concurrency=4 >> wal_level=hot_standby >> wal_sync_method=fdatasync >> wal_writer_delay=2000ms >> commit_delay=1000 >> checkpoint_segments=80 >> checkpoint_timeout=15min >> checkpoint_completion_target=0.7 >> archive_command='rsync ....' >> max_wal_senders=10 >> wal_keep_segments=38600 >> vacuum_defer_cleanup_age=100 >> hot_standby = on >> max_standby_archive_delay = 5min >> max_standby_streaming_delay = 5min >> hot_standby_feedback = on >> random_page_cost = 1.0 >> effective_cache_size = 240GB >> log_min_error_statement = warning >> log_min_duration_statement = 0 >> log_checkpoints = on >> log_connections = on >> log_disconnections = on >> log_line_prefix = '%m|%u|%d|%c|' >> log_lock_waits = on >> log_statement = 'all' >> log_timezone = 'localtime' >> track_activities = on >> track_functions = pl >> track_activity_query_size = 8192 >> autovacuum_max_workers = 5 >> autovacuum_naptime = 30s >> autovacuum_vacuum_threshold = 40 >> autovacuum_analyze_threshold = 20 >> autovacuum_vacuum_scale_factor = 0.10 >> autovacuum_analyze_scale_factor = 0.10 >> autovacuum_vacuum_cost_delay = 5ms >> default_transaction_isolation = 'read committed' >> max_locks_per_transaction = 128 >> >> >> >> Connection pool sizing (pgpool2) >> num_init_children = 1790 >> max_pool = 1 > >> > > > 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're > effectively just using pgpool as a proxy (as I recall, my memory is a little > fuzzy on pgpool now). Unless your app is stateful in some way or has unique > users for each of those 1800 connections you should lower the quantity of active > connections. A general starting point is usually cpu cores * 2, so you could up > max_pool and divide num_init_children by the same amount. > > Hard to say what you need to do without knowing what exactly you're doing > though. What's the nature of the app(s)? > >> I also add megacli parameters : >> >> Virtual Drive: 2 (Target Id: 2) >> Name :datassd >> RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 >> Size : 893.25 GB >> Sector Size : 512 >> Is VD emulated : Yes >> Mirror Data : 893.25 GB >> State : Optimal >> Strip Size : 256 KB >> Number Of Drives per span:2 >> Span Depth : 2 >> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write >> Cache if Bad BBU >> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write >> Cache if Bad BBU >> Default Access Policy: Read/Write >> Current Access Policy: Read/Write >> Disk Cache Policy : Enabled >> Encryption Type : None >> Bad Blocks Exist: No >> PI type: No PI >> >> Is VD Cached: No > >> > > > Not using your raid controllers write cache then? Not sure just how important > that is with SSDs these days, but if you've got a BBU set it to > "WriteBack". Also change "Cache if Bad BBU" to "No > Write Cache if Bad BBU" if you do that. > > >> >> Other outputs : >> fdisk -l >> >> Disk /dev/sdc: 959.1 GB, 959119884288 bytes >> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> Disk identifier: 0x00000000 >> >> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes >> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> Disk identifier: 0x00000000 >> >> >> cat /etc/mtab >> /dev/mapper/vg_datassd-lv_datassd /datassd ext4 >> rw,relatime,discard,nobarrier,data=ordered 0 0 >> (I added nobarrier option) >> >> >> cat /sys/block/sdc/queue/scheduler >> noop [deadline] cfq >> > >> > > > You could swap relatime for noatime,nodiratime. > You could also see if the noop scheduler makes any improvement. -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin