Re: hashjoin chosen over 1000x faster plan

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Wed, 10 Oct 2007 15:11:48 -0500

>>> On Wed, Oct 10, 2007 at  2:52 PM, in message
<1192045945.4233.351.camel@xxxxxxxxxx>, Simon Riggs <simon@xxxxxxxxxxxxxxx>
wrote: 
> 
> The fast plan is an all-or-nothing plan. It is *only* faster when the
> number of matched rows is zero. You know it is zero, but currently the
> planner doesn't, nor is it able to make use of the information when it
> has it, half thru execution. Even if we could work out the high
> probability of it being zero, we would still be left with the decision
> of whether to optimise for the zero or for the non-zero.

For a different case number which has four charges, two reopened:

 Sort  (cost=2450.27..2450.28 rows=4 width=146) (actual time=463.048..463.052 rows=4 loops=1)
   Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
   ->  Hash Left Join  (cost=2318.93..2450.23 rows=4 width=146) (actual time=462.857..462.995 rows=4 loops=1)
         Hash Cond: ((("CHST"."eventType")::bpchar = ("CTHE"."eventType")::bpchar) AND (("CHST"."caseType")::bpchar = ("CTHE"."caseType")::bpchar))
         ->  Nested Loop Left Join  (cost=0.00..115.67 rows=4 width=131) (actual time=0.045..0.165 rows=4 loops=1)
               ->  Index Scan using "Charge_pkey" on "Charge" "CH"  (cost=0.00..10.69 rows=4 width=112) (actual time=0.036..0.053 rows=4 loops=1)
                     Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar))
               ->  Index Scan using "CaseHist_pkey" on "CaseHist" "CHST"  (cost=0.00..26.18 rows=5 width=41) (actual time=0.018..0.019 rows=0 loops=4)
                     Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint))
         ->  Hash  (cost=2084.82..2084.82 rows=15607 width=98) (actual time=462.780..462.780 rows=15607 loops=1)
               ->  Subquery Scan "CTHE"  (cost=1630.43..2084.82 rows=15607 width=98) (actual time=355.962..433.081 rows=15607 loops=1)
                     ->  Merge Right Join  (cost=1630.43..1928.75 rows=15607 width=89) (actual time=355.960..414.249 rows=15607 loops=1)
                           Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar = "inner"."?column10?") AND ((d."eventType")::bpchar = "inner"."?column11?"))
                           ->  Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d  (cost=0.00..87.77 rows=2051 width=21) (actual time=0.025..0.713 rows=434 loops=1)
                           ->  Sort  (cost=1630.43..1669.45 rows=15607 width=76) (actual time=355.320..365.251 rows=15607 loops=1)
                                 Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
                                 ->  Nested Loop  (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..46.914 rows=15607 loops=1)
                                       ->  Index Scan using "ControlRecord_pkey" on "ControlRecord" c  (cost=0.00..4.27 rows=1 width=2) (actual time=0.010..0.019 rows=1 loops=1)
                                             Index Cond: (("countyNo")::smallint = 13)
                                       ->  Seq Scan on "CaseTypeHistEventB" b  (cost=0.00..383.07 rows=15607 width=74) (actual time=0.019..14.069 rows=15607 loops=1)
 Total runtime: 464.588 ms
(21 rows)

With set enable_hashjoin = off:

 Sort  (cost=3404.68..3404.69 rows=4 width=146) (actual time=448.049..448.053 rows=4 loops=1)
   Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
   ->  Merge Left Join  (cost=3287.55..3404.64 rows=4 width=146) (actual time=447.986..448.005 rows=4 loops=1)
         Merge Cond: (("outer"."?column16?" = "inner"."?column5?") AND ("outer"."?column17?" = "inner"."?column6?"))
         ->  Sort  (cost=115.71..115.72 rows=4 width=131) (actual time=0.179..0.182 rows=4 loops=1)
               Sort Key: ("CHST"."caseType")::bpchar, ("CHST"."eventType")::bpchar
               ->  Nested Loop Left Join  (cost=0.00..115.67 rows=4 width=131) (actual time=0.051..0.139 rows=4 loops=1)
                     ->  Index Scan using "Charge_pkey" on "Charge" "CH"  (cost=0.00..10.69 rows=4 width=112) (actual time=0.040..0.053 rows=4 loops=1)
                           Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar))
                     ->  Index Scan using "CaseHist_pkey" on "CaseHist" "CHST"  (cost=0.00..26.18 rows=5 width=41) (actual time=0.013..0.014 rows=0 loops=4)
                           Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint))
         ->  Sort  (cost=3171.84..3210.86 rows=15607 width=98) (actual time=446.459..446.936 rows=768 loops=1)
               Sort Key: ("CTHE"."caseType")::bpchar, ("CTHE"."eventType")::bpchar
               ->  Subquery Scan "CTHE"  (cost=1630.43..2084.82 rows=15607 width=98) (actual time=322.928..405.654 rows=15607 loops=1)
                     ->  Merge Right Join  (cost=1630.43..1928.75 rows=15607 width=89) (actual time=322.922..381.371 rows=15607 loops=1)
                           Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar = "inner"."?column10?") AND ((d."eventType")::bpchar = "inner"."?column11?"))
                           ->  Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d  (cost=0.00..87.77 rows=2051 width=21) (actual time=0.024..0.734 rows=434 loops=1)
                           ->  Sort  (cost=1630.43..1669.45 rows=15607 width=76) (actual time=322.294..332.182 rows=15607 loops=1)
                                 Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
                                 ->  Nested Loop  (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..45.539 rows=15607 loops=1)
                                       ->  Index Scan using "ControlRecord_pkey" on "ControlRecord" c  (cost=0.00..4.27 rows=1 width=2) (actual time=0.010..0.016 rows=1 loops=1)
                                             Index Cond: (("countyNo")::smallint = 13)
                                       ->  Seq Scan on "CaseTypeHistEventB" b  (cost=0.00..383.07 rows=15607 width=74) (actual time=0.019..13.754 rows=15607 loops=1)
 Total runtime: 449.660 ms
(24 rows)

So in all cases it is faster without the hashjoin; it's just a
question of whether it is 4% faster or 1000 times faster, with a 99+%
chance of being 1000 times faster.

This may get back to a question I've always had about the wisdom of
rounding fractional reads to whole numbers.  You lose information
which might lead to better plan choices.  You can't read half a row,
but you can read one row half the time.

-Kevin

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster