Re: Differences in identical queries

Richard Huxton <dev@xxxxxxxxxxxx> · Fri, 02 Mar 2007 15:38:38 +0000

Rob Schall wrote:
When I reanalyzed the anitmp table with just the 4 entries (2 with
istf=true and 2 with istf=false), both queries then ran the same way/time.
So it would appear, if you want to do a join or a subselect (IN), then
the number of items if will be comparing it to must be less than a
certain number.

I'm still not sure how you're concluding that.

> In my case, the full amount of false's that fail is
actually 40. So if you have a table of 2 million items (in current) and
want to join it to a table of 40 items, then it must do the longer
hashed join version and not the nested loop.

Below are the results with the anitmp table with 42 items. 40 set as
false, and 2 set as true. Is there a way to rewrite my query to have it
run as quick as the others?

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp
ON current.orignum=anitmp.ani AND istf=false;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.63..1031833.26 rows=256563 width=108) (actual
time=1889.469..155380.749 rows=653 loops=1)

OK - so here's the root of the problem. The planner thinks it'll get 
back 256,563 rows but actually gets 653. If you actually got more than 
200,000 rows back then a seq-scan on current might well make sense.

   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..920527.00 rows=10873900
width=108) (actual time=670.402..136192.991 rows=10681150 loops=1)
   ->  Hash  (cost=1.52..1.52 rows=41 width=8) (actual time=0.187..0.187
rows=0 loops=1)
         ->  Seq Scan on anitmp  (cost=0.00..1.52 rows=41 width=8)
(actual time=0.014..0.108 rows=40 loops=1)
               Filter: (istf = false)

Hmm - what sort of distribution of values do you have in "orignum" - 
telephone numbers, so presumably they're quite distinct.

I'd be tempted to up the statistics on that column, reanalyse both 
tables and see what happens.
  ALTER TABLE current ALTER COLUMN orignum SET STATISTICS=100;
You can set values up to 1000, start at 100 and step up. Not checked the 
syntax on that last statement btw.

You can see the before and after effects by looking at:
SELECT * FROM pg_stats WHERE tablename='current' AND attname='orignum';

--
  Richard Huxton
  Archonet Ltd