,some of our queries to the database taking long time to return the
results.
fsync: off (even we tested this parameter is on ,we observed the same
slowness )
If your queries take long time to return results, I suppose you are
talking about SELECTs.
fsync = off will not make SELECTs faster (only inserts, updates, deletes)
but it is not worth it as you risk data loss.
synchronous_commit = on has about the same advantages (faster...) as
fsync=off, but with no risk of data loss, so it is much better !
We have 300k row's in PolledData Table.In each STATSDATA table ,we have
almost 12 to 13 million rows.
OK. So you insert 13 million rows per day ?
That is about 150 rows per second.
Every one minute interval ,we insert data into to STATSDATA table.
I assume you are making an INSERT INTO statsdata VALUES (...... 150
values .....)
and not 150 inserts, yes ?
First Query :
SELECT COUNT(*) FROM (
SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM PolledData, STATSDATA8_21_2009 WHERE
( ( PolledData.ID=STATSDATA8_21_2009.POLLID)
AND ( ( TTIME >= 1250838027454)
AND ( TTIME <=1250838079654) ) ) ) t1;
* You could rewrite as :
SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM PolledData
JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID)
WHERE TTIME BETWEEN ... AND ...
- It is exactly the same query, but much easier to read.
* some ANALYZE-ing of your tables would be useful, since the estimates
from the planner look suspiciously different from reality
- ANALYZE is fast, you can run it often if you INSERT rows all the time
* You are joining on POLLID which is a NUMERIC in one table and a BIGINT
in the other table.
- Is there any reason for this type difference ?
- Could you use BIGINT in both tables ?
- BIGINT is faster than NUMERIC and uses less space.
- Type conversions use CPU cycles too.
* Should StatsData.ID have a foreign key REFERENCES PolledData.ID ?
- This won't make the query faster, but if you know all rows in StatsData
reference rows in PolledData (because of the FK constraint) and you want a
count(*) like above, you don't need to JOIN.
* TTIME >= 1250838027454 AND TTIME <=1250838079654
- TTIME should be TIMESTAMP (with or without TIMEZONE) or BIGINT but
certainly not NUMERIC
- An index on StatsData.TTIME would be useful, it would avoid Seq Scan,
replacing it with a Bitmap Scan, much faster
* work_mem
- since you have few connections you could increase work_mem
Second Query :
Same as first query
Third Query
SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME,
VAL
FROM PolledData, STATSDATA8_21_2009
WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID)
AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838027454) ) )
union all SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM
PolledData, STATSDATA8_20_2009
WHERE ( ( PolledData.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >=
1250767134601) AND ( TTIME <= 1250767134601) ) ) )t1 ;
Basically this is, again, exactly the same query as above, but two times,
and UNION ALL'ed
* You could rewrite it like this :
SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL
FROM
( SELECT ... FROM STATSDATA8_21_2009 WHERE TTIME BETWEEN ... AND ... )
UNION ALL SELECT ... FROM STATSDATA8_20_2009 WHERE TTIME BETWEEN ... AND
... )
)
JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID)
* If TTIME is the current time, and you insert data as it comes, data in
StatsData tables is probably already ordered on TTIME.
- If it is not the case, once a table is filled and becomes read-only,
consider CLUSTER on the index you created on TTIME
- It will make range queries on TTIME much faster
* Query plan
Seq Scan on statsdata8_21_2009 (cost=0.00..70574.88 rows=1 width=32)
(actual time=0.047..29066.227 rows=227 loops=1)
Seq Scan on statsdata8_20_2009 (cost=0.00..382519.60 rows=1 width=32)
(actual time=3136.008..93985.540 rows=1 loops=1)
Postgres thinks there is 1 row in those tables... that's probably not the
case !
The first one returns 227 rows, so the plan chosen in a catastrophe.
I was a bit intrigued by your query, so I made a little test...
BEGIN;
CREATE TABLE test( x INT, y INT );
INSERT INTO test (SELECT n,n FROM generate_series( 1,1000000 ) AS n );
CREATE INDEX test_x ON test( x );
CREATE INDEX test_y ON test( y );
COMMIT;
ANALYZE test;
test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE
a.x BETWEEN 0 AND 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=480.53..23759.14 rows=10406 width=16) (actual
time=15.614..1085.085 rows=10000 loops=1)
Hash Cond: (b.x = a.x)
-> Seq Scan on test b (cost=0.00..14424.76 rows=999976 width=8)
(actual time=0.013..477.516 rows=1000000 loops=1)
-> Hash (cost=350.46..350.46 rows=10406 width=8) (actual
time=15.581..15.581 rows=10000 loops=1)
-> Index Scan using test_x on test a (cost=0.00..350.46
rows=10406 width=8) (actual time=0.062..8.537 rows=10000 loops=1)
Index Cond: ((x >= 0) AND (x <= 10000))
Total runtime: 1088.462 ms
(7 lignes)
test=> set enable_seqscan TO 0;
SET
test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE
a.x BETWEEN 0 AND 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..30671.03 rows=10406 width=16) (actual
time=0.075..85.897 rows=10000 loops=1)
-> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406
width=8) (actual time=0.066..8.377 rows=10000 loops=1)
Index Cond: ((x >= 0) AND (x <= 10000))
-> Index Scan using test_x on test b (cost=0.00..2.90 rows=1 width=8)
(actual time=0.005..0.006 rows=1 loops=10000)
Index Cond: (b.x = a.x)
Total runtime: 90.160 ms
(6 lignes)
test=> set enable_nestloop TO 0;
SET
test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE
a.x BETWEEN 0 AND 10000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..31200.45 rows=10406 width=16) (actual
time=0.081..35.735 rows=10000 loops=1)
Merge Cond: (a.x = b.x)
-> Index Scan using test_x on test a (cost=0.00..350.46 rows=10406
width=8) (actual time=0.059..8.093 rows=10000 loops=1)
Index Cond: ((x >= 0) AND (x <= 10000))
-> Index Scan using test_x on test b (cost=0.00..28219.98 rows=999976
width=8) (actual time=0.016..7.494 rows=10001 loops=1)
Total runtime: 40.013 ms
(6 lignes)
I wonder why it doesn't choose the merge join at first...
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance