RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




   >  -----Original Message-----
   >  From: Tom Lane <tgl@xxxxxxxxxxxxx>
   >  Sent: Sunday, August 29, 2021 12:19
   >  To: ldh@xxxxxxxxxxxxxxxxxx
   >  Cc: Ranier Vilela <ranier.vf@xxxxxxxxx>; Andrew Dunstan
   >  <andrew@xxxxxxxxxxxx>; Justin Pryzby <pryzby@xxxxxxxxxxxxx>; pgsql-
   >  performance@xxxxxxxxxxxxxx
   >  Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
   >  and 13.4
   >  
   >  "ldh@xxxxxxxxxxxxxxxxxx" <ldh@xxxxxxxxxxxxxxxxxx> writes:
   >  > Is it possible that the client I am using or the way I am creating the test
   >  database might affect this scenario? I use DBeaver and use the default
   >  settings to create the database:
   >  > - default encoding: UTF8
   >  > - collate: English_United States.1252
   >  > - ctype: English_United States.1252
   >  
   >  Yeah, I was thinking of quizzing you about that.  I wonder whether
   >  something is thinking it needs to transcode to WIN1252 encoding and
   >  then back to UTF8, based on the .1252 property of the LC_XXX settings.
   >  That shouldn't account for any 500X factor either, but we're kind of
   >  grasping at straws here.
   >  
   >  Does Windows have any locale choices that imply UTF8 encoding
   >  exactly, and if so, do your results change when using that?  Alternatively,
   >  try creating a database with WIN1252 encoding and those locale
   >  settings.
   >  
   >  			regards, tom lane

Yeah, grasping at straws... and no material changes 😊 This is mystifying.

show lc_messages;
-- English_United States.1252

create table sampletest (a varchar, b varchar);
insert into sampletest (a, b)
select substr(md5(random()::text), 0, 15), (100000000*random())::integer::varchar
  from generate_series(1,100000);

CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
RETURNS real AS $$
BEGIN
  RETURN case when str is null then val else str::real end;
EXCEPTION WHEN OTHERS THEN
  RETURN val;
END;
$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

explain (analyze,buffers,COSTS,TIMING) 
select MAX(toFloat(a, null)) as "a" from sampletest
--Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual time=89527.032..89527.033 rows=1 loops=1)
--  Buffers: shared hit=647
--  ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32) (actual time=0.024..37.811 rows=100000 loops=1)
--        Buffers: shared hit=637
--Planning:
--  Buffers: shared hit=24
--Planning Time: 0.347 ms
--Execution Time: 89527.501 ms

explain (analyze,buffers,COSTS,TIMING) 
select MAX(toFloat(b, null)) as "b" from sampletest
--Aggregate  (cost=2137.00..2137.01 rows=1 width=4) (actual time=186.605..186.606 rows=1 loops=1)
--  Buffers: shared hit=637
--  ->  Seq Scan on sampletest  (cost=0.00..1637.00 rows=100000 width=8) (actual time=0.008..9.679 rows=100000 loops=1)
--        Buffers: shared hit=637
--Planning:
--  Buffers: shared hit=4
--Planning Time: 0.339 ms
--Execution Time: 186.641 ms


At this point, I am not sure how to proceed except to rethink that toFloat() function and many other places where we use exceptions. We get such dirty data that I need a "safe" way to convert a string to float without throwing an exception. BTW, I tried other combinations in case there may have been some weird interactions with the ::REAL conversion operator, but nothing made any change. Could you recommend another approach off the top of your head? I could use regexes for testing etc... Or maybe there is another option like a no-throw conversion that's built in or in some extension that you may know of? Like the "SAFE." Prefix in BigQuery.

Thank you,
Laurent.







[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux