Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 30, 2021 at 04:43:23AM +0200, Pavel Stehule wrote:
> po 30. 8. 2021 v 2:44 odesílatel ldh@xxxxxxxxxxxxxxxxxx napsal:
> > At this point, I am not sure how to proceed except to rethink that
> > toFloat() function and many other places where we use exceptions. We get
> > such dirty data that I need a "safe" way to convert a string to float
> > without throwing an exception. BTW, I tried other combinations in case
> > there may have been some weird interactions with the ::REAL conversion
> > operator, but nothing made any change. Could you recommend another approach
> > off the top of your head? I could use regexes for testing etc... Or maybe
> > there is another option like a no-throw conversion that's built in or in
> > some extension that you may know of? Like the "SAFE." Prefix in BigQuery.
> 
> CREATE OR REPLACE FUNCTION safe_to_double_precision(t text)
> RETURNS double precision AS $$
> BEGIN
>   IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN
>     RETURN $1::double precision;
>   ELSE
>     RETURN NULL;
>   END IF;
> END;
> $$ LANGUAGE plpgsql IMMUTABLE STRICT;

This tries to use a regex to determine if something is a "Number" or not.
Which has all the issues enumerated in painful detail by long answers on stack
overflow, and other wiki/blog/forums.

Rather than trying to define Numbers using regex, I'd try to avoid only the
most frequent exceptions and get 90% of the performance back.  I don't know
what your data looks like, but you might try things like this:

IF $1 IS NULL THEN RETURN $2
ELSE IF $1 ~ '^$' THEN RETURN $2
ELSE IF $1 ~ '[[:alpha:]]{2}' THEN RETURN $2
ELSE IF $1 !~ '[[:digit:]]' THEN RETURN $2
BEGIN                                                                                                                                                                                                     
   RETURN $1::float;
EXCEPTION WHEN OTHERS THEN                                                                                                                                                                                
   RETURN $2;
END;                                                                                                                                                                                                      

You can check the stackoverflow page for ideas as to what kind of thing to
reject, but it may depend mostly on your data (what is the most common string?
The most common exceptional string?).

I think it's possible that could even be *faster* than the original, since it
avoids the exception block for values which are for sure going to cause an
exception anyway.  It might be that using alternation (|) is faster (if less
readable) than using a handful of IF branches.

-- 
Justin





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux