There are other ways to influence the selection of a seqscan, notably
effective_cache_size and random_page_cost.
First, you need to find out at what point a seqscan is actually
faster than an index scan. That's going to be a trial and error
search, but eventually if you're going back far enough in time the
seqscan will be faster. EXPLAIN ANALYZE has it's own overhead, so a
better way to test this is with psql's timing command, and wrap the
query into a count so you're not shoving a bunch of data across to psql:
SELECT count(*) FROM (... your query goes here ...) a;
(SELECT 1 might work too and would be more accurate)
Once you've found the break even point, you can tweak all the cost
estimates. Start by making sure that effective_cache_size is set
approximately to how much memory you have. Increasing that will favor
an index scan. Decreasing random_page_cost will also favor an index
scan, though I'd try not to go below 2 and definitely not below 1.
You can also tweak the CPU cost estimates (lower numbers will favor
indexes). But keep in mind that doing that at a system level will
impact every query running in the system. You may have no choice but
to explicitly set custom parameters for just this statement. SET
LOCAL and wrapping the SELECT in a transaction is a less painful way
to do that.
On May 7, 2007, at 10:47 AM, Jonas Henriksen wrote:
Well thanks, but that don't help me much.
I've tried setting an extra condition using datetime>(now() - '14
weeks'::interval)
explain analyze
SELECT max(date_time) FROM data_values
where date_time > (now() - '14 weeks'::interval)
GROUP BY data_logger_id;
HashAggregate (cost=23264.52..23264.55 rows=2 width=12) (actual
time=1691.447..1691.454 rows=3 loops=1)
-> Bitmap Heap Scan on data_values (cost=7922.08..21787.31
rows=295442 width=12) (actual time=320.643..951.043 rows=298589
loops=1)
Recheck Cond: (date_time > (now() - '98 days'::interval))
-> Bitmap Index Scan on data_values_data_date_time_index
(cost=0.00..7848.22 rows=295442 width=0) (actual time=319.708..319.708
rows=298589 loops=1)
Index Cond: (date_time > (now() - '98 days'::interval))
Total runtime: 1691.598 ms
However, when I switch to using datetime>(now() - '15
weeks'::interval) I get:
explain analyze
SELECT max(date_time) FROM data_values
where date_time > (now() - '15 weeks'::interval)
GROUP BY data_logger_id;
HashAggregate (cost=23798.26..23798.28 rows=2 width=12) (actual
time=3237.816..3237.823 rows=3 loops=1)
-> Seq Scan on data_values (cost=0.00..22084.62 rows=342728
width=12) (actual time=0.037..2409.234 rows=344111 loops=1)
Filter: (date_time > (now() - '105 days'::interval))
Total runtime: 3237.944 ms
Doing "SET enable_seqscan=off" speeds up the query and forces the use
of the index, but I dont really love that solution...
regards Jonas:))
On 5/7/07, Peter Eisentraut <peter_e@xxxxxxx> wrote:
Am Montag, 7. Mai 2007 15:53 schrieb Jonas Henriksen:
> while if I add a GROUP BY data_logger the query uses a seq scan
and a
>
> lot of time:
> >> explain analyze SELECT max(date_time) FROM data_values GROUP BY
> data_logger_id;
I don't think there is anything you can do about this.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Jim Nasby jim@xxxxxxxxx
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)