Re: Efficiency of EXISTS?

Kenneth Tilton <ktilton@xxxxxxxx> · Mon, 23 Jul 2012 15:12:54 -0700

On Mon, Jul 23, 2012 at 2:52 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:

On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton <ktilton@xxxxxxxx> wrote:

> My mental model of the EXISTS clause must be off. This snippet appears at

> the end of a series of WITH clauses I suspect are irrelevant:

>

>> with etc etc ... , cids as

>>   (select distinct c.id from ddr2 c

>> join claim_entries ce on ce.claim_id = c.id

>> where (c.assigned_ddr = 879

>> or exists (select 1 from ddr_cdt dc

>> where

>> dc.sys_user_id = 879

>> and dc.document_type = c.document_type

>> -- makes it faster: and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)

>> )))

>>

>> select count(*) from cids

>

>

> If I uncomment the bit where it says "make it faster" I get decent response

> and the graphical analyze display shows the expected user+doctype+cdtcode

> index is being used (and nice thin lines suggesting efficient lookup).

>

> As it is, the analyze display shows the expected user+doctype index* being

> used but the lines are fat, and performance is an exponential disaster.

>

> * I created the (to me ) redundant user+doctype index trying to get Postgres

> to Do the Right Thing(tm), but I can see that was not the issue.

>

> I presume the reason performance drops off a cliff is because there can be

> 9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just look

> to see if there was at least one row matching user+doctype and return its

> decision. I have tried select *, select 1, and limit 1 on the nested select

> to no avail.

>

> Am I just doing something wrong? I am a relative noob. Is there some other

> hint I can give the planner?

hard to say without having the explain analyze output.  also it's not

clear why you need to use WITH, at least for the terminating query.

I'd just do:

select count(*) from

(

  inner_query

)

OK. Here is the full query:

with ddr as (
select c.id
 ,case
   when c.portal_user_id is null then u.provider_facility_id
   else pu.provider_facility_id
  end provider_facility_id
 from claims c
 left join sys_users u on u.id = c.created_by
 left join portal_users pu on pu.id = c.portal_user_id
 WHERE c.deleted = 0
 AND c.status >= 0
 AND (c.created_by is not null or c.portal_user_id is not null)
 AND true not in ( select ineligible_code_id in (46,65)
			from claim_carcs cc
			where c.id = cc.claim_id
			and cc.deleted = 0 )

 AND (false OR c.document_type = 0)
 AND (false OR c.group_plan_id = 44)

 limit 1500
)

,ddr2 as (
select c.id

 , c.document_type
 , c.radiographs
 , c.nea_number
 , c.assigned_ddr
 , d.provider_facility_id as submitting_facility_id
 , count(ca.id) as claim_attachments_count
 , cast(exists (select 1 from triples where s = c.id and sda='claims' and p = 'ddr-review-passed-on-by') as boolean) as passedon
 from ddr d
inner join
	claims c on d.id = c.id
	join claim_attachments ca on c.id = ca.claim_id
group by
 c.id
 , submitting_facility_id
having ((nullif(trim(c.nea_number, ' '),'') is not null)
	or case transmission_method
		when 'P' then count(distinct ca.id) > 1
		else count(distinct ca.id) > 0
	   end
	or c.radiographs > 0))

, cids as
  (select distinct c.id from ddr2 c
	join claim_entries ce on ce.claim_id = c.id
	where (c.assigned_ddr = 879
		or exists (select 1 from ddr_cdt dc 
				where
				dc.sys_user_id = 879
				and dc.document_type = c.document_type
				--and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
				)))

select count(*) from cids

And the explain output:

"Aggregate  (cost=56060.60..56060.61 rows=1 width=0)"
"  CTE ddr"
"    ->  Limit  (cost=306.29..16203.83 rows=1500 width=16)"
"          ->  Nested Loop Left Join  (cost=306.29..7442626.75 rows=702214 width=16)"
"                ->  Hash Left Join  (cost=306.29..7244556.97 rows=702214 width=12)"
"                      Hash Cond: (c.created_by = u.id)"
"                      ->  Index Scan using claims_lca1 on claims c  (cost=0.00..7230212.96 rows=702214 width=12)"
"                            Index Cond: ((deleted = 0) AND (status >= 0) AND (group_plan_id = 44) AND (document_type = 0))"
"                            Filter: (((created_by IS NOT NULL) OR (portal_user_id IS NOT NULL)) AND (NOT (SubPlan 1)))"
"                            SubPlan 1"
"                              ->  Index Scan using claim_carcs_claim_id on claim_carcs cc  (cost=0.00..9.23 rows=1 width=4)"
"                                    Index Cond: (c.id = claim_id)"
"                                    Filter: (deleted = 0)"
"                      ->  Hash  (cost=224.46..224.46 rows=6546 width=8)"
"                            ->  Seq Scan on sys_users u  (cost=0.00..224.46 rows=6546 width=8)"
"                ->  Index Scan using portal_users_pkey on portal_users pu  (cost=0.00..0.27 rows=1 width=8)"
"                      Index Cond: (id = c.portal_user_id)"
"  CTE ddr2"
"    ->  GroupAggregate  (cost=25714.40..28093.98 rows=286 width=27)"
"          Filter: ((NULLIF(btrim((c.nea_number)::text, ' '::text), ''::text) IS NOT NULL) OR CASE c.transmission_method WHEN 'P'::bpchar THEN (count(DISTINCT ca.id) > 1) ELSE (count(DISTINCT ca.id) > 0) END OR (c.radiographs > 0))"
"          ->  Sort  (cost=25714.40..25715.11 rows=286 width=27)"
"                Sort Key: c.id, d.provider_facility_id"
"                ->  Nested Loop  (cost=0.00..25702.73 rows=286 width=27)"
"                      ->  Nested Loop  (cost=0.00..12752.74 rows=1500 width=27)"
"                            ->  CTE Scan on ddr d  (cost=0.00..30.00 rows=1500 width=8)"

"                            ->  Index Scan using claims_pkey on claims c  (cost=0.00..8.47 rows=1 width=19)"
"                                  Index Cond: (id = d.id)"
"                      ->  Index Scan using claim_attachments_claim on claim_attachments ca  (cost=0.00..8.61 rows=2 width=8)"
"                            Index Cond: (claim_id = c.id)"
"          SubPlan 3"
"            ->  Index Scan using triples_s_idx on triples  (cost=0.00..8.28 rows=1 width=0)"
"                  Index Cond: (s = c.id)"
"                  Filter: ((sda = 'claims'::text) AND (p = 'ddr-review-passed-on-by'::text))"
"          SubPlan 4"
"            ->  Bitmap Heap Scan on triples  (cost=102.70..1010.15 rows=823 width=8)"
"                  Recheck Cond: (p = 'ddr-review-passed-on-by'::text)"
"                  Filter: (sda = 'claims'::text)"
"                  ->  Bitmap Index Scan on triples_p_idx  (cost=0.00..102.49 rows=3497 width=0)"
"                        Index Cond: (p = 'ddr-review-passed-on-by'::text)"
"  CTE cids"
"    ->  HashAggregate  (cost=11759.51..11760.52 rows=101 width=4)"
"          ->  Nested Loop  (cost=0.00..11722.94 rows=14627 width=4)"
"                ->  CTE Scan on ddr2 c  (cost=0.00..112.75 rows=144 width=4)"
"                      Filter: ((assigned_ddr = 879) OR (alternatives: SubPlan 6 or hashed SubPlan 7))"
"                      SubPlan 6"
"                        ->  Seq Scan on ddr_cdt dc  (cost=0.00..134293.58 rows=361282 width=0)"
"                              Filter: ((sys_user_id = 879) AND (document_type = c.document_type))"
"                      SubPlan 7"
"                        ->  Bitmap Heap Scan on ddr_cdt dc  (cost=20292.74..73868.80 rows=1083845 width=4)"
"                              Recheck Cond: (sys_user_id = 879)"
"                              ->  Bitmap Index Scan on "ddr-cdt-idx-user-doc"  (cost=0.00..20021.78 rows=1083845 width=0)"
"                                    Index Cond: (sys_user_id = 879)"
"                ->  Index Scan using claim_entries_claim_id on claim_entries ce  (cost=0.00..79.35 rows=102 width=4)"
"                      Index Cond: (claim_id = c.id)"
"  ->  CTE Scan on cids  (cost=0.00..2.02 rows=101 width=0)"

More interesting: I tried reducing the complex query to a simpler query and what I saw was that my mental model of EXISTS is fine. :) It was efficient in the way I expected, and faster than the version that did the last test (the cdt_code test). Now I just have to find out why it is slower in vivo.

Thx, ken

-- 
Kenneth Tilton

Director of Software Development
MCNA Dental Plans

200 West Cypress Creek Road
Suite 500
Fort Lauderdale, FL 33309
954-730-7131 X181 (Office)
954-628-3347 (Fax)

1-800-494-6262 X181 (Toll Free)

ktilton@mcna.net (Email)

www.mcna.net (Website)
CONFIDENTIALITY NOTICE: This electronic mail may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents by persons other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail so that we may correct our internal records. Please then delete the original message. Thank you.