OK… I apologize for the long email before. Right after I sent it, I thought of a much simpler use-case to illustrate the issue which doesn’t depend on any special data I have access o and complex pivoting. It’s as raw as I can make it. I create a table with 1M rows and 2 columns. Column “a” is a random string, while column “b” is a random integer as a string. Then I use a UDF that converts strings to floats and handles an exception if the incoming string is not parsable
as a float. Then I do a simple select of each column. In the “a” case, the UDF throws and catches lots of exceptions. In the “b” case, the conversion is clean and exceptions are not thrown. create table sampletest (a varchar, b varchar); insert into sampletest (a, b) select substr(md5(random()::text), 0, 15), (100000000*random())::integer::varchar from generate_series(1,1000000); CREATE OR REPLACE FUNCTION toFloat(str varchar, val real) RETURNS real AS $$ BEGIN RETURN case when str is null then val else str::real end; EXCEPTION WHEN OTHERS THEN RETURN val; END; $$ LANGUAGE plpgsql COST 1 IMMUTABLE; select MAX(toFloat(a, null)) as "a" from sampletest; select MAX(toFloat(b, null)) as "b" from sampletest; On purpose, I am doing a max(toFloat) instead of toFloat(max) to exercise the UDF 1M times. V13.4 “a” scenario (exceptions) ------------------------------------------------------------- Aggregate (cost=14778.40..14778.41 rows=1 width=4) (actual time=774098.537..774098.538 rows=1 loops=1) Buffers: shared hit=6373 -> Seq Scan on sampletest (cost=0.00..11975.60 rows=560560 width=32) (actual time=0.011..285.458 rows=1000000 loops=1) Buffers: shared hit=6370 Planning Time: 0.066 ms Execution Time: 774,098.563 ms V13.4 “b” scenario (no exceptions) ------------------------------------------------------------- Aggregate (cost=14778.40..14778.41 rows=1 width=4) (actual time=1510.200..1510.201 rows=1 loops=1) Buffers: shared hit=6385 -> Seq Scan on sampletest (cost=0.00..11975.60 rows=560560 width=32) (actual time=0.024..115.196 rows=1000000 loops=1) Buffers: shared hit=6370 Planning: Buffers: shared hit=26 Planning Time: 0.361 ms Execution Time: 1,530.659 ms V11.2 “a” scenario (exceptions) ------------------------------------------------------------- Aggregate (cost=21658.00..21658.01 rows=1 width=4) (actual time=26528.286..26528.286 rows=1 loops=1) Buffers: shared hit=6393 -> Seq Scan on sampletest (cost=0.00..16562.00 rows=1019200 width=15) (actual time=0.037..190.633 rows=1000000 loops=1) Buffers: shared hit=6370 Planning Time: 1.182 ms Execution Time: 26,530.492 ms V11.2 “b” scenario (no exceptions) ------------------------------------------------------------- Aggregate (cost=21658.00..21658.01 rows=1 width=4) (actual time=1856.116..1856.116 rows=1 loops=1) Buffers: shared hit=6370 -> Seq Scan on sampletest (cost=0.00..16562.00 rows=1019200 width=8) (actual time=0.014..88.152 rows=1000000 loops=1) Buffers: shared hit=6370 Planning Time: 0.098 ms Execution Time: 1,856.152 ms Summary:
Conclusion:
Thank you! Laurent Hasson. From: ldh@xxxxxxxxxxxxxxxxxx <ldh@xxxxxxxxxxxxxxxxxx>
Hello all, I think I have identified a major performance issue between V11.2 and 13.4 with respect to exception handling in UDFs. I have the following simplified query that pivots data and makes use of a UDF to convert data
to a specific type, in this case, float: select "iccqa_iccassmt_fk" , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null) as "iccqa_DEPTH_CM" , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null) as "iccqa_LENGTH_CM" , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null) as "iccqa_WIDTH_CM" , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null) as "iccqa_DRAIN_PRESENT" , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null) as "iccqa_MEASUREMENTS_TAKEN" , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null) as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION" from ( -- 'A pivoted view of ICC QA assessments' select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment. , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data. , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data. from VNAHGEDW_FACTS.AssessmentICCQA_Raw where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)' , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION' ) group by 1, 2 ) T group by 1 ; The UDF is simple as follows: CREATE OR REPLACE FUNCTION TILDA.toFloat(str varchar, val real) RETURNS real AS $$ BEGIN RETURN case when str is null then val else str::real end; EXCEPTION WHEN OTHERS THEN RETURN val; END; $$ LANGUAGE plpgsql COST 1 IMMUTABLE; It works as a coalesce but with a conversion. I think I have identified some large performance difference with the exception handling. It so happens that with the last 3 columns ('DRAIN PRESENT', 'MEASUREMENTS TAKEN'
and 'SIGNS AND SYMPTOMS OF INFECTION'), the data is VERY dirty. There is a mix of 0/1, YES/NO, and other mistyped stuff. This means these 3 columns throw lots of exceptions in the UDF. To illustrate, I simply break this into 2 queries. select "iccqa_iccassmt_fk" , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM" , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM" , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM" -- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT" -- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN" -- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION" from ( -- 'A pivoted view of ICC QA assessments' select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment. , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data. , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data. from VNAHGEDW_FACTS.AssessmentICCQA_Raw where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)' , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION' ) group by 1, 2 ) T group by 1 ; The performance is as expected. HashAggregate (cost=448463.70..448467.20 rows=200 width=16) (actual time=6760.797..9585.397 rows=677899 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 147489kB Buffers: shared hit=158815 -> HashAggregate (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4576.514..5460.770 rows=2374628 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 1 Memory Usage: 368657kB Buffers: shared hit=158815 -> Seq Scan on assessmenticcqa_raw (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.033..3298.544 rows=2374628 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[])) Rows Removed by Filter: 10734488 Buffers: shared hit=158815 Planning: Buffers: shared hit=3 Planning Time: 0.198 ms Execution Time: 9678.120 ms However, once we switch with the three “bad” columns, the results fall apart. select "iccqa_iccassmt_fk" -- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM" -- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM" -- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM" , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT" , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN" , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION" from ( -- 'A pivoted view of ICC QA assessments' select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment. , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data. , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data. from VNAHGEDW_FACTS.AssessmentICCQA_Raw where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)' , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION' ) group by 1, 2 ) T group by 1 ; The performance falls apart. It is a huge performance difference from ~10s to ~11mn and the only difference that I can think of is that the data is dirty which causes the exception path to be taken. The explain
is: HashAggregate (cost=448463.70..448467.20 rows=200 width=16) (actual time=6672.921..696753.080 rows=677899 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 131105kB Buffers: shared hit=158815 -> HashAggregate (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4574.918..5446.022 rows=2374628 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 1 Memory Usage: 368657kB Buffers: shared hit=158815 -> Seq Scan on assessmenticcqa_raw (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.032..3300.616 rows=2374628 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[])) Rows Removed by Filter: 10734488 Buffers: shared hit=158815 Planning: Buffers: shared hit=3 Planning Time: 0.201 ms Execution Time: 696868.845 ms Now, on V11.2, the explain is: HashAggregate (cost=492171.36..492174.86 rows=200 width=16) (actual time=19322.522..50556.738 rows=743723 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Buffers: shared hit=11 read=174155 dirtied=13 -> HashAggregate (cost=445458.43..457915.21 rows=1245678 width=56) (actual time=16260.015..17575.088 rows=2601088 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared read=174155 dirtied=13 -> Seq Scan on assessmenticcqa_raw (cost=0.00..425803.93 rows=2620600 width=38) (actual time=0.126..14425.239 rows=2601088 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[])) Rows Removed by Filter: 11778360 Buffers: shared read=174155 dirtied=13 Planning Time: 36.121 ms Execution Time: 50730.255 ms So, we are seeing two issues:
-
I think exception handling is significantly slower between V11.2 and v13.4. I see almost a 14x difference from 50s to 700s.
-
Comparing the two queries on V11.2, the difference is 13s vs 50s. So even on V11.2, the exception handling adds a significant overhead which I was not expecting. I’ll be happy to update my test cases and share additional info if needed. Thank you, Laurent Hasson. |