Re: Avoiding seq scan over 3.2 millions rows

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 11 Nov 2008 21:29:00 -0500

"Andrus" <kobruleht2@xxxxxx> writes:
> explain analyze SELECT sum(xxx)
>    FROM dok JOIN rid USING (dokumnr)
>    WHERE  dok.kuupaev BETWEEN '2008-04-01' AND '2008-04-30'

> "Hash Join  (cost=29584.84..308259.32 rows=142686 width=0) (actual
> time=68510.748..96932.174 rows=117883 loops=1)"
> "  Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
> "  ->  Seq Scan on rid  (cost=0.00..195599.68 rows=3217368 width=4) (actual
> time=17.130..56572.857 rows=3247363 loops=1)"
> "  ->  Hash  (cost=29243.76..29243.76 rows=53231 width=4) (actual
> time=15878.782..15878.782 rows=44685 loops=1)"
> "        ->  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..29243.76
> rows=53231 width=4) (actual time=47.102..15651.059 rows=44685 loops=1)"
> "              Index Cond: ((kuupaev >= '2008-04-01'::date) AND (kuupaev <=
> '2008-04-30'::date))"
> "Total runtime: 97364.282 ms"

> Query performs seq scan over 3.2 million of rows.

There isn't anything particularly wrong with that plan.  The alternative
that you seem to be wishing for would involve ~50000 index probes into
"rid", which is hardly going to be free.

You could try reducing random_page_cost to push the planner in the
direction of preferring the indexscan solution, but whether this is
actually better in your situation remains to be seen.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general