Re: expensive function in select list vs limit clause

Albe Laurenz <laurenz.albe@xxxxxxxxxx> · Wed, 5 Apr 2017 10:34:22 +0000

Chris Mair wrote:
> I've found a (simple) situation where the planner does something I don't understand.
> 
> Below is a complete test case followed by output.
> 
>  From the timings it appears that in the second explain analyze query a function
> call in the select list (expensive()) is evaluated in the sequential scan node
> *for each* row in big, despite the use of limit.
> 
> I would have expected expensive() to be evaluated only for the ten rows
> in the result set. Hence the second explain analyze query shouldn't be more
> expensive than the first one.
> 
> My trust in Postgres' planner goes so far as I feel the planner is right and there
> must be a reason for this :)
> 
> Could someone help me understand this behaviour?
[...]
> create function expensive() returns double precision as
> $$
>      begin
>          for i in 1 .. 15000 loop
>          end loop;
>          return random();
>      end;
> $$ language 'plpgsql';

This is unrelated, but you should set COST for an expensive function
to help the planner.

[...]
> -- now do the same, but add an expensive() column to the result:
> -- takes ~ 29s => WHY?
> 
> explain analyze select r, expensive() from big order by r offset 0 limit 10;
[...]
>                                                            QUERY PLAN
> ------------------------------------------------------------------------------------------
> ------------------------------------
>   Limit  (cost=286034.64..286034.67 rows=10 width=8) (actual time=28932.311..28932.314
> rows=10 loops=1)
>     ->  Sort  (cost=286034.64..288534.64 rows=1000000 width=8) (actual
> time=28932.309..28932.310 rows=10 loops=1)
>           Sort Key: r
>           Sort Method: top-N heapsort  Memory: 25kB
>           ->  Seq Scan on big  (cost=0.00..264425.00 rows=1000000 width=8) (actual
> time=0.062..28822.520 rows=1000000 loops=1)
>   Planning time: 0.038 ms
>   Execution time: 28932.339 ms
> (7 rows)

ORDER BY can only be processed after all rows have been fetched, this
includes the expensive result column.

You can easily avoid that by applying the LIMIT first:

  SELECT r, expensive()
  FROM (SELECT r
        FROM big
        ORDER BY r
        LIMIT 10
       ) inner;

I don't know how hard it would be to only fetch the necessary columns before
the ORDER BY and fetch the others after the LIMIT has been applied, but it
is probably nontrivial and would require processing time for *everybody*
who runs a query with ORDER BY to solve a rare problem that can easily be
worked around.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general