Re: Avoiding seq scan over 3.2 millions rows

tv@xxxxxxxx · Wed, 12 Nov 2008 10:47:30 +0100 (CET)

> "Andrus" <kobruleht2@xxxxxx> writes:
>> explain analyze SELECT sum(xxx)
>>    FROM dok JOIN rid USING (dokumnr)
>>    WHERE  dok.kuupaev BETWEEN '2008-04-01' AND '2008-04-30'
>
>> "Hash Join  (cost=29584.84..308259.32 rows=142686 width=0) (actual
>> time=68510.748..96932.174 rows=117883 loops=1)"
>> "  Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
>> "  ->  Seq Scan on rid  (cost=0.00..195599.68 rows=3217368 width=4)
>> (actual
>> time=17.130..56572.857 rows=3247363 loops=1)"
>> "  ->  Hash  (cost=29243.76..29243.76 rows=53231 width=4) (actual
>> time=15878.782..15878.782 rows=44685 loops=1)"
>> "        ->  Index Scan using dok_kuupaev_idx on dok
>> (cost=0.00..29243.76
>> rows=53231 width=4) (actual time=47.102..15651.059 rows=44685 loops=1)"
>> "              Index Cond: ((kuupaev >= '2008-04-01'::date) AND (kuupaev
>> <=
>> '2008-04-30'::date))"
>> "Total runtime: 97364.282 ms"
>
>> Query performs seq scan over 3.2 million of rows.
>
> There isn't anything particularly wrong with that plan.  The alternative
> that you seem to be wishing for would involve ~50000 index probes into
> "rid", which is hardly going to be free.
>
> You could try reducing random_page_cost to push the planner in the
> direction of preferring the indexscan solution, but whether this is
> actually better in your situation remains to be seen.

Or just use "enable_seqscan = off" - that should penalize the seq scan and
use an index scan instead. You can then execute the EXPLAIN ANALYZE again
and you'll see how fast the index scan is in this case.

You may try some "standard" optimization - I'd try clustering the 'rid'
table according to the "dokumnr" column, but I doubt it will outperform
the seq scan in this case. If the rows are 'wide' you may try to 'compact'
the table so that it's possible to read it faster during the seq scan. Try
something like

CREATE TABLE rid2 AS SELECT dokumnr, xxx FROM rid;

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general