Server crashed and now experiencing slow running queries

Keaton Adams <kadams@xxxxxxxxxxx> · Tue, 04 Dec 2007 22:50:39 -0700

Title: Server crashed and now experiencing slow running queries

We have two servers configured the same way running the same type of processes that write/read to the database.  Server 2 filled up pg_xlog and crashed.  When it came back we began to experience slow query performance.  I ran an ANALYZE against the tables involved in the query, but for some reason the optimizer is still choosing a hash join instead of a nested loop join, which is killing performance.  The query on Server 1 runs in 14 seconds and the same query on Server 2 runs in 15 minutes:

Server 1 

                           ->  HashAggregate  (cost=501922.84..501922.85 rows=1 width=532)

                                 ->  Nested Loop  (cost=250961.41..501922.83 rows=1 width=532)

                                       Join Filter: (("outer".host)::text = ("inner".host)::text)

                                       ->  HashAggregate  (cost=250961.40..250961.41 rows=1 width=26)

                                             ->  Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size  (cost=0.01..250961.40 rows=1 width=26)

                                                   Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

                                       ->  Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh  (cost=0.01..250961.40 rows=1 width=42)

                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Server 2

                           ->  HashAggregate  (cost=1814101.48..1814129.36 rows=2230 width=532)

                                 ->  Hash Join  (cost=906978.28..1814079.18 rows=2230 width=532)

                                       Hash Cond: (("outer".host)::text = ("inner".host)::text)

                                       ->  Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size lh  (cost=0.01..906877.88 rows=40147 width=42)

                                             Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

                                       ->  Hash  (cost=906978.27..906978.27 rows=1 width=516)

                                             ->  HashAggregate  (cost=906978.25..906978.26 rows=1 width=26)

                                                   ->  Index Scan using mxl_fs_size_did_cre_idx on mxl_fs_size  (cost=0.01..906877.88 rows=40147 width=26)

                                                         Index Cond: ((created >= (now() - '01:00:00'::interval)) AND (created <= now()))

Besides ANALYZE, what else can I do / look at to figure out why the optimizer is making the choices it is on Server 2, now causing slow performance problems?

Thanks,

Keaton