Statistics and Multi-Column indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I know this has been discussed various times...

We are maintaining a large multi tenant database where *all* tables have a tenant-id and all indexes and PKs lead with the tenant-id. Statistics and counts for the all other columns are only really meaningful within the context of the tenant they belong to.

There appear to be five options for me:
1. Using single column indexes on all interesting columns and rely on PostgreSQLs bitmap indexes to combine them (which are pretty cool). 2. Use multi column indexes and accept that sometimes Postgres pick the wrong index (because a non-tenant-id column might seem highly selective over the table, but it is not for a particular tenant - or vice versa). 3. Use a functional index that combines multiple columns and only query via these, that causes statistics
gathering for the expression.
I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t WHERE tenantid||column1 = '...' 4. Play with n_distinct and/or set the statistics for the inner columns to some fixed values that lead to the plans that we want.
5. Have a completely different schema and maybe a database per tenant.

Currently we use Oracle and query hinting, but I do not like that practice at all (and Postgres does not have hints anyway).
Are there any other options?

#1 would be the simplest, but I am concerned about the overhead, both maintaining two indexes and building the bitmap during queries - for every query.

I don't think #2 is actually an option. We have some tenants with many (sometimes 100s) millions of rows per table,
and picking the wrong index would be disastrous.

Could something like #3 be generally added to Postgres? I.e. if there is a multi column index keep combined statistics for the involved columns. Of course in that case is it no longer possible to query the index by prefix. #3 also seems expensive as the expression needs to be evaluated for each changed row.

Still trying #4. I guess it involves setting the stat target for the inner columns to 0 and then inserting my own records into pg_statistic. Probably only setting n_distinct, i.e. set it "low" if the inner column is not selective within the context of a tenant and "high" otherwise.

For various reasons #5 is also not an option.

And of course the same set of questions comes up with joins.

Thanks.

-- Lars


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux