Re: Postgres 15 SELECT query doesn't use index under RLS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alexander Okulovich <aokulovich@xxxxxxxxxxxxx> writes:
> Recently, we upgraded the AWS RDS instance from Postgres 12.14 to 15.4 
> and noticed extremely high disk consumption on the following query 
> execution:
> select (exists (select 1 as "one" from "public"."indexed_commit" where 
> "public"."indexed_commit"."repo_id" in (964992,964994,964999, ...);
> For some reason, the query planner starts using Seq Scan instead of the 
> index on the "repo_id" column when requesting under user limited with 
> RLS. On prod, it happens when there are more than 316 IDs in the IN part 
> of the query, on stage - 3. If we execute the request from Superuser, 
> the planner always uses the "repo_id" index.

The superuser bypasses the RLS policy.  When that's enforced, the
query can no longer use an index-only scan (because it needs to fetch
tenant_id too).  Moreover, it may be that only a small fraction of the
rows fetched via the index will satisfy the RLS condition.  So the
estimated cost of an indexscan query could be high enough to persuade
the planner that a seqscan is a better idea.

> Luckily, we can easily reproduce this on our stage database (which is 
> smaller). If we add a multicolumn "repo_id, tenant_id" index, the 
> planner uses it (Index Only Scan) with any IN params count under RLS.

Yeah, that would be the obvious way to ameliorate both problems.

If in fact you were getting decent performance from an indexscan plan
before, the only explanation I can think of is that the repo_ids you
are querying for are correlated with the tenant_id, so that the RLS
filter doesn't eliminate very many rows from the index result.  The
planner wouldn't realize that by default, but if you create extended
statistics on repo_id and tenant_id then it might do better.  Still,
you probably want the extra index.

> Could you please clarify if this is a Postgres bug or not?

You haven't shown any evidence suggesting that.

> Should we 
> include the "tenant_id" column in all our indexes to make them work 
> under RLS?

Adding tenant_id is going to bloat your indexes quite a bit,
so I wouldn't do that except in cases where you've demonstrated
it's important.

			regards, tom lane





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux