Re: [HACKERS] EXISTS optimization

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Fri, 23 Mar 2007 17:26:04 -0500

>>> On Fri, Mar 23, 2007 at  4:49 PM, in message <25339.1174686582@xxxxxxxxxxxxx>,
Tom Lane <tgl@xxxxxxxxxxxxx> wrote: 
> "Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> writes:
>> explain analyze
>> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", 
> "H"."userId", "H"."time"
>>   FROM "Adjustment" "A"
>>   JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND 
> "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
>>   WHERE "H"."tranType" = 'A'
>>     AND "A"."date" > DATE '2006- 01- 01'
>>     AND "H"."countyNo" = 66
>>     AND "A"."countyNo" = 66
>>     AND EXISTS
>>         (
>>           SELECT 1 FROM "TranDetail" "D"
>>             WHERE "D"."tranNo" = "H"."tranNo"
>>               AND "D"."countyNo" = "H"."countyNo"
>>               AND "D"."caseNo" LIKE '2006TR%'
>>         )
>> ;
> 
>> The commercial product scans the index on caseNo in TranDetail to build a 
> work table of unique values, then uses indexed access to the TranHeader and 
> then to Adjustment.
> 
> If you want that, try rewriting the EXISTS to an IN:
> 
>    AND ("H"."tranNo", "H"."countyNo") IN
>         (
>           SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
>             WHERE "D"."caseNo" LIKE '2006TR%'
>         )

Nice.  I get this:

explain analyze
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
  FROM "Adjustment" "A"
  JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
  WHERE "H"."tranType" = 'A'
    AND "A"."date" > DATE '2006- 01- 01'
    AND "H"."countyNo" = 66
    AND "A"."countyNo" = 66
    AND ("H"."tranNo", "H"."countyNo") IN
        (
          SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
            WHERE "D"."caseNo" LIKE '2006TR%'
        )
;

 Nested Loop  (cost=27.76..36.38 rows=1 width=46) (actual time=92.999..200.398 rows=2209 loops=1)
   Join Filter: (("H"."tranNo")::integer = ("A"."tranNo")::integer)
   ->  Nested Loop  (cost=27.76..32.08 rows=1 width=50) (actual time=92.970..176.472 rows=2209 loops=1)
         ->  HashAggregate  (cost=27.76..27.77 rows=1 width=6) (actual time=92.765..100.810 rows=9788 loops=1)
               ->  Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D"  (cost=0.00..27.66 rows=20 width=6) (actual time=0.059..60.967 rows=46301 loops=1)
                     Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint = 66))
                     Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)
         ->  Index Scan using "TranHeader_pkey" on "TranHeader" "H"  (cost=0.00..4.30 rows=1 width=46) (actual time=0.006..0.006 rows=0 loops=9788)
               Index Cond: ((("H"."tranNo")::integer = ("D"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
               Filter: (("tranType")::bpchar = 'A'::bpchar)
   ->  Index Scan using "Adjustment_pkey" on "Adjustment" "A"  (cost=0.00..4.28 rows=1 width=22) (actual time=0.008..0.009 rows=1 loops=2209)
         Index Cond: ((("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) AND (("A"."countyNo")::smallint = 66))
         Filter: ((date)::date > '2006-01-01'::date)
 Total runtime: 201.306 ms

That's the good news.  The bad news is that I operate under a management portability dictate which doesn't currently allow that syntax, since not all of the products they want to cover support it.  I tried something which seems equivalent, but it is running for a very long time.  I'll show it with just the explain while I wait to see how long the explain analyze takes.

explain
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
  FROM "Adjustment" "A"
  JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
  WHERE "H"."tranType" = 'A'
    AND "A"."date" > DATE '2006- 01- 01'
    AND "H"."countyNo" = 66
    AND "A"."countyNo" = 66
    AND "H"."tranNo" IN
        (
          SELECT "D"."tranNo" FROM "TranDetail" "D"
            WHERE "D"."caseNo" LIKE '2006TR%'
              AND "D"."countyNo" = "H"."countyNo"
        )
;

 Nested Loop  (cost=0.00..181673.08 rows=1 width=46)
   Join Filter: (("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar)
   ->  Seq Scan on "Adjustment" "A"  (cost=0.00..2384.27 rows=11733 width=22)
         Filter: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66))
   ->  Index Scan using "TranHeader_pkey" on "TranHeader" "H"  (cost=0.00..15.27 rows=1 width=46)
         Index Cond: ((("H"."tranNo")::integer = ("A"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
         Filter: ((("tranType")::bpchar = 'A'::bpchar) AND (subplan))
         SubPlan
           ->  Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D"  (cost=0.00..27.66 rows=20 width=4)
                 Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint = ($0)::smallint))
                 Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)

> We don't currently try to flatten EXISTS into a unique/join plan as we
> do for IN.  I seem to recall not doing so when I rewrote IN planning
> because I didn't think it would be exactly semantically equivalent,
> but that was awhile ago.  Right at the moment it seems like it ought
> to be equivalent as long as the comparison operators are strict.

There are a great many situations where they are exactly semantically equivalent.  In fact, the commercial database product usually generates an identical plan.  I could try to work out (or better yet find) a formal description of when that equivalence holds, if someone would be up for implementing it.  Barring that, I could see if management would approve some time for me to look at submitting a patch, but I haven't looked at the code involved, so I have no idea of the scale of effort involved yet.

-Kevin