Re: Statistics and Multi-Column indexes

Samuel Gendler <sgendler@xxxxxxxxxxxxxxxx> · Sun, 10 Jul 2011 14:31:25 -0700

On Sun, Jul 10, 2011 at 2:16 PM, lars <lhofhansl@xxxxxxxxx> wrote:

I know this has been discussed various times...

We are maintaining a large multi tenant database where *all* tables have a tenant-id and all indexes and PKs lead with the tenant-id.

Statistics and counts for the all other columns are only really meaningful within the context of the tenant they belong to.

There appear to be five options for me:

1. Using single column indexes on all interesting columns and rely on PostgreSQLs bitmap indexes to combine them (which are pretty cool).

2. Use multi column indexes and accept that sometimes Postgres pick the wrong index (because a non-tenant-id

column might seem highly selective over the table, but it is not for a particular tenant - or vice versa).

3. Use a functional index that combines multiple columns and only query via these, that causes statistics

gathering for the _expression_.

I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t WHERE tenantid||column1 = '...'

4. Play with n_distinct and/or set the statistics for the inner columns to some fixed values that lead to the plans that we want.

5. Have a completely different schema and maybe a database per tenant.

What about partitioning tables by tenant id and then maintaining indexes on each partition independent of tenant id, since constraint exclusion should handle filtering by tenant id for you.  That seems like a potentially more tolerable variant of #5 How many tenants are we talking about?  I gather partitioning starts to become problematic when the number of partitions gets large.