Re: Function's execute overhead reducing

Игорь Выскорко <vyskorko.igor@xxxxxxxxx> · Mon, 18 Nov 2019 12:23:23 +0700

> My suspicion is that the query in the function is hidden from the
> planner and so it ends up running two separate SELECT queries without
> reference to each other. A quick test and possible solution:
>
> My results for the above on old machine.:
>
> Straight query:
>                                                              QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst (cost=0.00..1644807.00 rows=500000 width=116) (actual
> time=0.033..2808.596 rows=500000 loops=1)
>     SubPlan 1
>       -> Aggregate (cost=3.25..3.26 rows=1 width=8) (actual
> time=0.005..0.005 rows=1 loops=500000)
>             -> Function Scan on jsonb_each _a (cost=0.00..1.00
> rows=100 width=64) (actual time=0.003..0.003 rows=3 loops=500000)
>   Planning Time: 16.162 ms
>   Execution Time: 2846.815 ms
>
> Function in query
>                                                     QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst (cost=0.00..138557.00 rows=500000 width=116) (actual
> time=0.119..7048.285 rows=500000 loops=1)
>   Planning Time: 0.105 ms
>   Execution Time: 7098.057 ms
>
> I changed the function to:
>
> CREATE OR REPLACE FUNCTION public.tst_func(i integer, a jsonb, b jsonb)
>   RETURNS TABLE(id integer, val bigint)
>   LANGUAGE sql
>   STABLE
> AS $function$
>    select
>      i,
>      sum(
>        ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
>      )
>    from
>      jsonb_each(a) _a
> $function$
>
> Using 'table' function:
>
> test=# explain analyze select
>    tst.id,
>    j1,
>    j2
> from
>    tst
> join
>    tst_func(id, j1, j2) as f
> on
>   tst.id = f.id;
>                                                               QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop (cost=1.00..531057.00 rows=1 width=108) (actual
> time=0.042..2002.258 rows=500000 loops=1)
>     -> Seq Scan on tst (cost=0.00..13557.00 rows=500000 width=108)
> (actual time=0.014..70.936 rows=500000 loops=1)
>     -> Subquery Scan on f (cost=1.00..1.02 rows=1 width=4) (actual
> time=0.003..0.003 rows=1 loops=500000)
>           Filter: (tst.id = f.id)
>           -> Aggregate (cost=1.00..1.01 rows=1 width=12) (actual
> time=0.003..0.003 rows=1 loops=500000)
>                 -> Function Scan on jsonb_each _a (cost=0.00..1.00
> rows=100 width=0) (actual time=0.003..0.003 rows=3 loops=500000)
>   Planning Time: 0.297 ms
>   Execution Time: 2037.601 ms
>
>>  PS current situation maybe solved by denormalization with precalculations of function and storing results along with data, but it's not the way i would like to use, because it leads to more issues to solve (invalidation, for instance)
>
> --
> Adrian Klaver
> adrian.klaver@xxxxxxxxxxx

Thanks a lot! It works even better than subquery