optimizing immutable vs. stable function calls?

Karl Czajkowski <karlcz@xxxxxxx> · Wed, 18 Jan 2017 14:36:33 -0800

I originally was thinking I had a performance problem related to
row-level security but in reducing it to a simple test case I realized
it applies more generally.  The query planner does not seem to
recognize that it can eliminate redundant calls to a STABLE function.
It will optimize the same call if the function is marked IMMUTABLE.

In my case, the function call does not take any arguments and is thus
trivially independent of row data, and appears in a WHERE clause being
compared to constants. Why wouldn't the optimizer treat this case the
same as IMMUTABLE?

Toy problem
------------------------------------------

This abstracts a scenario where a web application stores client
identity attributes (i.e. roles or groups) into a config variable
which can be accessed from a stored procedure to allow policy checks
against that client context.

The example query emulates a policy that is a disjunction of two
rules:

  1. clients who intersect a constant ACL
     '{C,E,Q}'::text[] && current_attributes()

  2. rows whose "cls" intersect a constant mask
     cls = ANY('{0,1,5}'::text[])

This is a common idiom for us, where some rows are restricted from the
general user base but certain higher privilege clients can see all
rows.

So our test query is simply:

  SELECT *
  FROM mydata
  WHERE '{C,E,Q}'::text[] && current_attributes()
     OR cls = ANY('{0,1,5}'::text[])
     ;

Test setup
------------------------------------------

I set a very high cost on the function to attempt to encourage the
planner to optimize away the calls, but it doesn't seem to make any
difference.  Based on some other discussions I have seen, I also tried
declaring it as LEAKPROOF but saw no change in behavior there either.

  CREATE OR REPLACE FUNCTION current_attributes() RETURNS text[]
  STABLE COST 1000000
  AS $$
  BEGIN
    RETURN current_setting('mytest.attributes')::text[];
  END;
  $$ LANGUAGE plpgsql;

  CREATE TABLE mydata (
    id serial PRIMARY KEY,
    val integer,
    cls text
  );

  INSERT INTO mydata (val, cls)
  SELECT v, (v % 13)::text
  FROM generate_series(1, 1000000, 1) AS s (v);

  CREATE INDEX ON mydata(cls);
  ANALYZE mydata;

Resulting plans and performance
------------------------------------------

These results are with PostgreSQL 9.5 on a Fedora 25 workstation but I
see essentially the same behavior on 9.6 as well.

For an intersecting ACL scenario, I set client context as:

  SELECT set_config('mytest.attributes', '{A,B,C,D}', False);

and for non-intersecting, I set:

  SELECT set_config('mytest.attributes', '{}', False);

In an ideal world, the planner knows it can solve the ACL intersection
once, independent of any row data and then form a different plan
optimized around that answer, the same as if we'd just put a constant
true or false term in our WHERE clause.

A. STABLE function for intersecting ACL

   Seq Scan on mydata  (cost=0.00..2500021656.00 rows=238030 width=10) (actual time=0.053..1463.382 rows=1000000 loops=1)
     Filter: (('{C,E,Q}'::text[] && current_attributes()) OR (cls = ANY ('{0,1,5}'::text[])))
   Planning time: 0.093 ms
   Execution time: 1500.395 ms

B. IMMUTABLE function for intersecting ACL

   Seq Scan on mydata  (cost=0.00..15406.00 rows=1000000 width=10) (actual time=0.009..78.474 rows=1000000 loops=1)
   Planning time: 0.247 ms
   Execution time: 117.610 ms

C. STABLE function for non-intersecting ACL

   Seq Scan on mydata  (cost=0.00..2500021656.00 rows=238030 width=10) (actual time=0.179..1190.484 rows=230770 loops=1)
     Filter: (('{C,E,Q}'::text[] && current_attributes()) OR (cls = ANY ('{0,1,5}'::text[])))
     Rows Removed by Filter: 769230
   Planning time: 0.088 ms
   Execution time: 1199.729 ms

D. IMMUTABLE function for non-intersecting ACL

   Bitmap Heap Scan on mydata  (cost=4058.36..12631.44 rows=230333 width=10) (actual time=32.444..76.934 rows=230770 loops=1)
     Recheck Cond: (cls = ANY ('{0,1,5}'::text[]))
     Heap Blocks: exact=5406
     ->  Bitmap Index Scan on mydata_cls_idx  (cost=0.00..4000.78 rows=230333 width=0) (actual time=31.012..31.012 rows=230770 loops=1)
           Index Cond: (cls = ANY ('{0,1,5}'::text[]))
   Planning time: 0.331 ms
   Execution time: 87.475 ms

You can see the roughly 10-15x performance difference above. In our
real application with more sprawling data, sorting, and lots of
available column indices, the effects are even more pronounced.

Is there any hope for the planner to start optimizing these
row-independent, stable function calls the same way it does immutable
ones?

We tend to start a transaction, set the config parameter with the web
client identity attributes, and then run the other
performance-sensitive statements to completion (or error) in the same
transaction.  We don't further modify the parameter during the
lifetime of one web request handler.  We cycle through different
parameter settings only when we reuse a connection for multiple web
requests which may all be from different clients.

Is it safe to lie and call the function IMMUTABLE to get good plans,
even though we do modify the config value infrequently during one
postgres connection?  Does postgres ever cache immutable function
results between statements?  If so, is there any way to invalidate
that cache without fully closing and reopening a connection?

Thanks,

Karl

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance