On 07/05/2011 07:26 PM, Clem Dickey wrote:
Updates after belatedly reading the "slow queries" guidelines:
Version: PostgreSQL 8.4.8 on x86_64-redhat-linux-gnu, compiled by GCC
gcc (GCC) 4.4.5 20101112 (Red Hat 4.4.5-2), 64-bit
The query has always been slow; the table for this test case is never
updated. I don't run VACUUM but do run ANALYZE.
Originally all database config parameters were the default. Since
yesterday I have changed
shared_buffers = 224MB
effective_cache_size = 1024MB
but seen no change in behavior.
Column | Type | Modifiers
--------+---------+-----------
y | integer | not null
x | integer | not null
k | integer | not null
j | integer | not null
z | integer | not null
Indexes:
"t_pkey" PRIMARY KEY, btree (j, k, x, y, z)
The table population, in pseudocode, is this:
for x in 0..9
for y in 0..9999
for z in 0..29
INSERT INTO t VALUES(y,x,0,0,z)
The query is:
SELECT *
FROM (
SELECT * FROM t GROUP BY j,k,x,z,y
) AS f
NATURAL JOIN t;
The EXPLAIN ANALYZE output is http://explain.depesz.com/s/KGk
Notes on the analysis:
1. I see that the planner estimates that GROUP BY will reduce 300K rows
to 30K, a bit odd because every row which the planner could examine is
in a unique group.
2. The JOIN is expected to produce one row. I'm not sure how the planner
came up with that estimate.
By contrast, placing an intermediate result in a table "u" provides a
result in about 3 seconds:
=> EXPLAIN ANALYZE CREATE TABLE u AS SELECT * FROM t GROUP BY
j,k,x,z,y;EXPLAIN ANALYZE SELECT * FROM u NATURAL JOIN t;DROP TABLE u;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Group (cost=44508.90..49008.90 rows=30000 width=20) (actual
time=1305.381..2028.385 rows=300000 loops=1)
-> Sort (cost=44508.90..45258.90 rows=300000 width=20) (actual
time=1305.374..1673.843 rows=300000 loops=1)
Sort Key: j, k, x, z, y
Sort Method: external merge Disk: 8792kB
-> Seq Scan on t (cost=0.00..4911.00 rows=300000 width=20)
(actual time=0.008..62.935 rows=300000 loops=1)
Total runtime: 2873.590 ms
(6 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=46229.86..72644.38 rows=1 width=20) (actual
time=1420.527..2383.507 rows=300000 loops=1)
Merge Cond: ((t.j = u.j) AND (t.k = u.k) AND (t.x = u.x) AND (t.y =
u.y) AND (t.z = u.z))
-> Index Scan using t_pkey on t (cost=0.00..14877.18 rows=300000
width=20) (actual time=0.013..118.244 rows=300000 loops=1)
-> Materialize (cost=46229.86..50123.52 rows=311493 width=20)
(actual time=1420.498..1789.864 rows=300000 loops=1)
-> Sort (cost=46229.86..47008.59 rows=311493 width=20)
(actual time=1420.493..1692.988 rows=300000 loops=1)
Sort Key: u.j, u.k, u.x, u.y, u.z
Sort Method: external merge Disk: 8784kB
-> Seq Scan on u (cost=0.00..5025.93 rows=311493
width=20) (actual time=0.018..78.850 rows=300000 loops=1)
Total runtime: 2424.870 ms
(9 rows)
(Adding an "ANALYZE" on the temporary table improves the JOIN estimated
fow count from 1 to about 299500, but does not change the plan.)
Clem Dickey
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance