Statistics and Multi-Column indexes

lars <lhofhansl@xxxxxxxxx> · Sun, 10 Jul 2011 14:16:06 -0700

I know this has been discussed various times...

We are maintaining a large multi tenant database where *all* tables have 
a tenant-id and all indexes and PKs lead with the tenant-id.
Statistics and counts for the all other columns are only really 
meaningful within the context of the tenant they belong to.

There appear to be five options for me:
1. Using single column indexes on all interesting columns and rely on 
PostgreSQLs bitmap indexes to combine them (which are pretty cool).
2. Use multi column indexes and accept that sometimes Postgres pick the 
wrong index (because a non-tenant-id
column might seem highly selective over the table, but it is not for a 
particular tenant - or vice versa).
3. Use a functional index that combines multiple columns and only query 
via these, that causes statistics
gathering for the expression.
I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t 
WHERE tenantid||column1 = '...'
4. Play with n_distinct and/or set the statistics for the inner columns 
to some fixed values that lead to the plans that we want.
5. Have a completely different schema and maybe a database per tenant.

Currently we use Oracle and query hinting, but I do not like that 
practice at all (and Postgres does not have hints anyway).
Are there any other options?

#1 would be the simplest, but I am concerned about the overhead, both 
maintaining two indexes and building the bitmap during queries - for 
every query.

I don't think #2 is actually an option. We have some tenants with many 
(sometimes 100s) millions of rows per table,
and picking the wrong index would be disastrous.

Could something like #3 be generally added to Postgres? I.e. if there is 
a multi column index keep combined statistics for
the involved columns. Of course in that case is it no longer possible to 
query the index by prefix.
#3 also seems expensive as the expression needs to be evaluated for each 
changed row.

Still trying #4. I guess it involves setting the stat target for the 
inner columns to 0 and then inserting my own records into
pg_statistic. Probably only setting n_distinct, i.e. set it "low" if the 
inner column is not selective within the context of a tenant and "high" 
otherwise.

For various reasons #5 is also not an option.

And of course the same set of questions comes up with joins.

Thanks.

-- Lars

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance