Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 30, 2021 at 8:44 AM ldh@xxxxxxxxxxxxxxxxxx
<ldh@xxxxxxxxxxxxxxxxxx> wrote:
>
> Yeah, grasping at straws... and no material changes 😊 This is mystifying.
>
> show lc_messages;
> -- English_United States.1252
>
> create table sampletest (a varchar, b varchar);
> insert into sampletest (a, b)
> select substr(md5(random()::text), 0, 15), (100000000*random())::integer::varchar
>   from generate_series(1,100000);
>
> CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
> RETURNS real AS $$
> BEGIN
>   RETURN case when str is null then val else str::real end;
> EXCEPTION WHEN OTHERS THEN
>   RETURN val;
> END;
> $$ LANGUAGE plpgsql COST 1 IMMUTABLE;
>
> explain (analyze,buffers,COSTS,TIMING)
> select MAX(toFloat(a, null)) as "a" from sampletest
> --Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual time=89527.032..89527.033 rows=1 loops=1)
> --  Buffers: shared hit=647
> --  ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056 width=32) (actual time=0.024..37.811 rows=100000 loops=1)
> --        Buffers: shared hit=637
> --Planning:
> --  Buffers: shared hit=24
> --Planning Time: 0.347 ms
> --Execution Time: 89527.501 ms
>
> explain (analyze,buffers,COSTS,TIMING)
> select MAX(toFloat(b, null)) as "b" from sampletest
> --Aggregate  (cost=2137.00..2137.01 rows=1 width=4) (actual time=186.605..186.606 rows=1 loops=1)
> --  Buffers: shared hit=637
> --  ->  Seq Scan on sampletest  (cost=0.00..1637.00 rows=100000 width=8) (actual time=0.008..9.679 rows=100000 loops=1)
> --        Buffers: shared hit=637
> --Planning:
> --  Buffers: shared hit=4
> --Planning Time: 0.339 ms
> --Execution Time: 186.641 ms
>
>
> At this point, I am not sure how to proceed except to rethink that toFloat() function and many other places where we use exceptions. We get such dirty data that I need a "safe" way to convert a string to float without throwing an exception. BTW, I tried other combinations in case there may have been some weird interactions with the ::REAL conversion operator, but nothing made any change. Could you recommend another approach off the top of your head? I could use regexes for testing etc... Or maybe there is another option like a no-throw conversion that's built in or in some extension that you may know of? Like the "SAFE." Prefix in BigQuery.

I tried this scenario using edb's 13.3 x64 install:

postgres=# select version();
                          version
------------------------------------------------------------
 PostgreSQL 13.3, compiled by Visual C++ build 1914, 64-bit
(1 row)


postgres=# \l postgres
                          List of databases
   Name   |  Owner   | Encoding | Collate | Ctype | Access privileges
----------+----------+----------+---------+-------+-------------------
 postgres | postgres | UTF8     | C       | C     |
(1 row)

postgres=# explain (analyze,buffers,COSTS,TIMING)
postgres-# select MAX(toFloat(a, null)) as "a" from sampletest;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=2137.00..2137.01 rows=1 width=4) (actual
time=44962.279..44962.280 rows=1 loops=1)
   Buffers: shared hit=657
   ->  Seq Scan on sampletest  (cost=0.00..1637.00 rows=100000
width=15) (actual time=0.009..8.900 rows=100000 loops=1)
         Buffers: shared hit=637
 Planning:
   Buffers: shared hit=78
 Planning Time: 0.531 ms
 Execution Time: 44963.747 ms
(8 rows)

and with locally compiled REL_13_STABLE's head on the same machine:

rjuju=# select version();
                          version
------------------------------------------------------------
 PostgreSQL 13.4, compiled by Visual C++ build 1929, 64-bit
(1 row)

rjuju=# \l rjuju
                       List of databases
 Name  | Owner | Encoding | Collate | Ctype | Access privileges
-------+-------+----------+---------+-------+-------------------
 rjuju | rjuju | UTF8     | C       | C     |
(1 row)

rjuju-# select MAX(toFloat(a, null)) as "a" from sampletest;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1477.84..1477.85 rows=1 width=4) (actual
time=460.334..460.334 rows=1 loops=1)
   Buffers: shared hit=646 read=1
   ->  Seq Scan on sampletest  (cost=0.00..1197.56 rows=56056
width=32) (actual time=0.010..7.612 rows=100000 loops=1)
         Buffers: shared hit=637
 Planning:
   Buffers: shared hit=20 read=1
 Planning Time: 0.125 ms
 Execution Time: 460.527 ms
(8 rows)

Note that I followed [1], so I simply used "build" and "install".  I
have no idea what is done by default and if NLS is included or not.

So if default build on windows has NLS included, it probably means
that either there's something specific on edb's build (I have no idea
how their build is produced) or their version of msvc is responsible
for that.

[1]: https://www.postgresql.org/docs/current/install-windows-full.html#id-1.6.4.8.10






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux