Re: Optimising a query requiring seqscans=0

Russ Brown <pickscrape@xxxxxxxxx> · Sat, 23 Sep 2006 17:58:48 -0500

On Thu, 2006-09-21 at 23:39 -0400, Jim Nasby wrote:
> On Sep 14, 2006, at 11:15 AM, Russ Brown wrote:
> > We recently upgraded our trac backend from sqlite to postgres, and I
> > decided to have a little fun and write some reports that delve into
> > trac's subversion cache, and got stuck with a query optimisation
> > problem.
> >
> > Table revision contains 2800+ rows
> > Table node_change contains 370000+.
> <...>
> > I've got stuck with this query:
> >
> >    SELECT author, COUNT(DISTINCT r.rev)
> >      FROM revision AS r
> > LEFT JOIN node_change AS nc
> >        ON r.rev=nc.rev
> >     WHERE r.time >= EXTRACT(epoch FROM (NOW() - interval '30
> > days'))::integer
> 
> Man I really hate when people store time_t in a database...
> 

I know. Probably something to do with database engine independence. I
don't know if sqlite even has a date type (probably does, but I haven't
checked).

> >  GROUP BY r.author;
> >
> > Statistics are set to 20, and I have ANALYZEd both tables.
> >
> > The report itself isn't important, but I'm using this as an  
> > exercise in
> > PostgreSQL query optimisation and planner tuning, so any help/hints
> > would be appreciated.
> 
> Setting statistics higher (100-200), at least for the large table  
> will likely help. Also make sure that you've set effective_cache_size  
> correctly (I generally set it to total memory - 1G, assuming the  
> server has at least 4G in it).

Thank you: the problem was the effective_cache_size (which I hadn't
changed from the default of 1000). This machine doesn't have loads of
RAM, but I knocked it up to 65536 and now the query uses the index,
without having to change the statistics.

Thanks a lot!

> --
> Jim Nasby                                    jimn@xxxxxxxxxxxxxxxx
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
> 
>