Slow queries / commits, mis-configuration or hardware issues?

Cody Caughlan <toolbag@xxxxxxxxx> · Mon, 14 Nov 2011 10:16:46 -0800

Hi, running Postgres 9.1.1 on an EC2 m1.xlarge instance. Machine is a
dedicated master with 2 streaming replication nodes.

The machine has 16GB of RAM and 4 cores.

We're starting to see some slow queries, especially COMMITs that are
happening more frequently. The slow queries are against seemingly
well-indexed tables.

I have log_min_duration = 150ms

Slow commits like:

2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG:  duration: 3062.784 ms
 statement: COMMIT
2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG:  duration: 2593.351 ms
 statement: COMMIT

These slow COMMITs are against tables that received a large number of
UPDATEs and are growing fairly rapidly.

And slow queries like:

2011-11-14 17:50:20 UTC pid:6519 (16/0-0) LOG:  duration: 1694.456 ms
statement: SELECT  "facebook_wall_posts".* FROM "facebook_wall_posts"
WHERE "facebook_wall_posts"."token" =
'984c44e75975b224b38197cf8f8fc76a' LIMIT 1

query plan: http://explain.depesz.com/s/wbm
The # of rows in facebook_wall_posts is 5841

Structure of facebook_wall_posts:
                                         Table "public.facebook_wall_posts"
       Column       |            Type             |
        Modifiers
--------------------+-----------------------------+------------------------------------------------------------------
 id                 | integer                     | not null default
nextval('facebook_wall_posts_id_seq'::regclass)
 album_id           | integer                     | not null
 token              | character varying(32)       | not null
 fb_recipient_id    | character varying(64)       | not null
 post_id            | character varying(100)      | not null
 fb_post_created_at | timestamp without time zone |
 data               | text                        |
 created_at         | timestamp without time zone |
 updated_at         | timestamp without time zone |
 fb_post_deleted_at | timestamp without time zone |
Indexes:
    "facebook_wall_posts_pkey" PRIMARY KEY, btree (id)
    "index_facebook_wall_posts_on_token" UNIQUE, btree (token)
    "index_facebook_wall_posts_on_album_id" btree (album_id)

And another slow query:

2011-11-14 17:52:44 UTC pid:14912 (58/0-0) LOG:  duration: 979.740 ms
statement: SELECT "facebook_friends"."friend_id" FROM
"facebook_friends"  WHERE "facebook_friends"."user_id" = 9134671

Query plan: http://explain.depesz.com/s/x1q
# of rows in facebook_friends is 27075088

Structure of facebook_friends:
                                     Table "public.facebook_friends"
   Column   |            Type             |
Modifiers
------------+-----------------------------+---------------------------------------------------------------
 id         | integer                     | not null default
nextval('facebook_friends_id_seq'::regclass)
 user_id    | integer                     | not null
 friend_id  | integer                     | not null
 created_at | timestamp without time zone |
Indexes:
    "facebook_friends_pkey" PRIMARY KEY, btree (id)
    "index_facebook_friends_on_user_id_and_friend_id" UNIQUE, btree
(user_id, friend_id)

We have auto-vacuum enabled and running. But yesterday I manually ran
vacuum on the database. Autovacuum settings:

autovacuum                      | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold    | 50
autovacuum_freeze_max_age       | 200000000
autovacuum_max_workers          | 3
autovacuum_naptime              | 60
autovacuum_vacuum_cost_delay    | 20
autovacuum_vacuum_cost_limit    | -1
autovacuum_vacuum_scale_factor  | 0.2
autovacuum_vacuum_threshold     | 50

other postgresql.conf settings:

shared_buffers = 3584MB
wal_buffers = 16MB
checkpoint_segments = 32
max_wal_senders = 10
checkpoint_completion_target = 0.9
wal_keep_segments = 1024
maintenance_work_mem = 256MB
work_mem = 88MB
shared_buffers = 3584MB
effective_cache_size = 10GB

The PGDATA dir is a RAID10 on 4 local ("ephemeral" in EC2 speak)
drives. I ran some dd tests and received the following output:

--- WRITING ---
root@sql03:/data# time sh -c "dd if=/dev/zero of=/data/tmp/bigfile
bs=8k count=4000000 && sync"
4000000+0 records in
4000000+0 records out
32768000000 bytes (33 GB) copied, 670.663 s, 48.9 MB/s

real	11m52.199s
user	0m2.720s
sys	0m45.330s

--- READING ---
root@sql03:/data# time dd of=/dev/zero if=/data/tmp/bigfile bs=8k
4000000+0 records in
4000000+0 records out
32768000000 bytes (33 GB) copied, 155.429 s, 211 MB/s

real	2m35.434s
user	0m2.400s
sys	0m33.160s

I have enabled log_checkpoints and here is a recent sample from the log:

2011-11-14 17:38:48 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8
recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync
files=60, longest=1.484 s, average=0.049 s
2011-11-14 17:39:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
2011-11-14 17:43:49 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9
recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync
files=82, longest=2.943 s, average=0.050 s
2011-11-14 17:44:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
2011-11-14 17:48:47 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
14159 buffers (3.1%); 0 transaction log file(s) added, 0 removed, 6
recycled; write=269.818 s, sync=2.119 s, total=271.948 s; sync
files=71, longest=1.192 s, average=0.029 s
2011-11-14 17:49:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
2011-11-14 17:53:47 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
11337 buffers (2.5%); 0 transaction log file(s) added, 0 removed, 7
recycled; write=269.901 s, sync=2.508 s, total=272.419 s; sync
files=71, longest=1.867 s, average=0.035 s
2011-11-14 17:54:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time
2011-11-14 17:58:48 UTC pid:3965 (-0) LOG:  checkpoint complete: wrote
15706 buffers (3.4%); 0 transaction log file(s) added, 0 removed, 7
recycled; write=270.104 s, sync=3.612 s, total=273.727 s; sync
files=67, longest=3.051 s, average=0.053 s
2011-11-14 17:59:15 UTC pid:3965 (-0) LOG:  checkpoint starting: time

I've been collecting random samples from pg_stat_bgwriter:
https://gist.github.com/4faec2ca9a79ede281e1

So given all this information (if you need more just let me know), is
there something fundamentally wrong or mis-configured? Do I have an
I/O issue?

Thanks for any insight.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance