Re: WAL + SSD = slow inserts?

Tomas Vondra <tv@xxxxxxxx> · Sat, 07 Dec 2013 00:59:26 +0100

On 5.12.2013 17:13, Skarsol wrote:
> On Thu, Dec 5, 2013 at 9:50 AM, Scott Marlowe <scott.marlowe@xxxxxxxxx
> <mailto:scott.marlowe@xxxxxxxxx>> wrote:
> 
>     On Thu, Dec 5, 2013 at 8:16 AM, Skarsol <skarsol@xxxxxxxxx
>     <mailto:skarsol@xxxxxxxxx>> wrote:
>     > psql (PostgreSQL) 9.2.5
>     > Red Hat Enterprise Linux Server release 6.4 (Santiago)
>     > Linux 2.6.32-358.6.1.el6.x86_64 #1 SMP Fri Mar 29 16:51:51 EDT
>     2013 x86_64
>     > x86_64 x86_64 GNU/Linux
>     > All relevant filesystems are ext4
>     >
>     > Changes from defaults:
>     > max_connections = 500
>     > shared_buffers = 32000MB
>     > temp_buffers = 24MB
>     > work_mem = 1GB
>     > maintenance_work_mem = 5GB
>     > wal_level = archive
>     > wal_buffers = 16MB
>     > checkpoint_completion_target = 0.9
>     > archive_mode = on
>     > archive_command = 'test ! -f /databases/pg_archive/db/%f && cp %p
>     > /databases/pg_archive/db/%f'
>     > effective_cache_size = 64000MB
>     > default_statistics_target = 5000
>     > log_checkpoints = on
>     > stats_temp_directory = '/tmp/pgstat'
> 
>     OK I'd make the following changes.
>     1: Drop shared_buffers to something like 1000MB
>     2: drop work_mem to 16MB or so. 1GB is pathological, as it can make
>     the machine run out of memory quite fast.
>     3: drop max_connections to 100 or so. if you really need 500 conns,
>     then work_mem of 1G is that much worse.
> 
>     Next, move pg_xlog OFF the SSDs and back onto spinning media and put
>     your data/base dir on the SSDs.
> 
>     SSDs aren't much faster, if at all, for pg_xlog, but are much much
>     faster for data/base files.
> 
>     Also changing the io schduler for the SSDs to noop:
> 
>     http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/block/switching-sched.txt?id=HEAD
> 
> 
> Changing the scheduler to noop seems to have had a decent effect. I've
> made the other recommended changes other than the connections as we do
> need that many currently. We're looking to implement pg_bouncer which
> should help with that.

I'm wondering if you left checkpoint_segments at the default? Try
bumping it up to 32 or more, otherwise it might cause frequent
checkpoints. I see you have log_checkpoints=on, so do you see any
checkpoint messages in the logs?

Also, how much data are you actually inserting? Are you inserting a
single row, or large number of them? What is the structure of the table?
Are there any foreign keys in it?

If you do a batch of such inserts (so that it takes a minute or so in
total), what do you see in top/iostat? Is the system CPU or IO bound?
Show us a dozen lines of

   iostat -x -k 1
   vmstat 1

> Moving the whole database to SSD isn't an option currently due to size.

Moving the WAL to SSDs is rather wasteful, in my experience. A RAID
controller with decent write cache (256MB or more) and BBU is both
faster and cheaper.

Also, there are huge differences between various SSDs vendors and
models, or even between the same SSD model with different firmware
versions. What SSD model are you using? Have you updated the firmware?

> The slowest inserts are happening on tables that are partitioned by
> creation time. As part of the process there is a rule to select curval
> from a sequence but there are no other selects or anything in  the
> trigger procedure. Could the sequence be slowing it down? I dont see a
> way to change the tablespace of one.

There are many possible causes for this - for example you're not telling
us something about the table structure (e.g. FK constraints might be
causing this) or about the hardware.

Have you done some tests on the SSD to verify it works properly? I've
seen broken firmwares behaving like this (unexpectedly high latencies in
random intervals etc.).

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance