lars <lhofhansl@xxxxxxxxx> wrote: > We are maintaining a large multi tenant database where *all* > tables have a tenant-id and all indexes and PKs lead with the > tenant-id. Statistics and counts for the all other columns are > only really meaningful within the context of the tenant they > belong to. > > There appear to be five options for me: > 1. Using single column indexes on all interesting columns and rely > on PostgreSQLs bitmap indexes to combine them (which are pretty > cool). Those are cool -- when programmers are skeptical that they should just say *what* they want and let the database figure out how to get it, I like to point out that this option is available to the planner, but not to application programming. Of course, there are a great many other reason *I* find more compelling, but this one tends to impress application programmers. Assuming you keep the primary key as a multi-column index, this seems like a good place to start. > 2. Use multi column indexes and accept that sometimes Postgres > pick the wrong index (because a non-tenant-id column might seem > highly selective over the table, but it is not for a particular > tenant - or vice versa). If you have a lot of queries which access data based on matching some set of columns, an occasional multicolumn index in addition to the individual column index may be worth it. You might want to avoid prepared statements, since these are planned for the general case and can fall down badly for the extremes. > 3. Use a functional index that combines multiple columns and only > query via these, that causes statistics gathering for the > expression. I.e. create index i on t((tenantid||column1)) and > SELECT ... FROM t WHERE tenantid||column1 = '...' I would hold off on that until I saw evidence of a specific need. > 4. Play with n_distinct and/or set the statistics for the inner > columns to some fixed values that lead to the plans that we want. Try not to think in terms of "plans we want", but in terms of modeling your costs so that, given your tables and indexes, the PostgreSQL planner will do a good job of picking a fast plan. You normally need to tweak a few of the costing factors to match the reality of your server and load. > 5. Have a completely different schema and maybe a database per > tenant. > Are there any other options? If most queries operate within a single tenant and you have less than 100 tenants, you might think about partitioned tables. Beyond 100, or if most queries need to look at many partitions, it becomes problematic. > I don't think #2 is actually an option. We have some tenants with > many (sometimes 100s) millions of rows per table, and picking the > wrong index would be disastrous. You can always drop counter-productive or unused indexes. I think it's best to look at indexes, as much as possible, as database tuning options, rather than something with any semantic meaning. If you're doing things right, an index will never change the result of any query, so you are free to try different tunings and see what's fastest with your schema and your production load. We have tables with 100s of millions of rows where everything is indexed by county number. The biggest county has about 20% of the rows; the smallest about 0.05%. We generally do pretty well with (1) and (2). We do occasionally find it useful to create an index with a WHERE clause, though. You might want to consider those. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance