Re: Clustered index to preserve data locality in a multitenant application?

Vick Khera <vivek@xxxxxxxxx> · Tue, 30 Aug 2016 13:26:33 -0400

On Tue, Aug 30, 2016 at 7:10 AM, Nicolas Grilly
<nicolas@xxxxxxxxxxxxxxxx> wrote:
> Let's say we have a table containing data for 10,000 tenants and 10,000 rows
> per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block
> contains ~10 rows. Let's way we want to compute the sum of an integer column
> for all rows belonging to a given tenant ID.

I'll assume you have an index on the tenant ID. In that case, your
queries will be pretty fast.

On some instances, we have multi-column indexes starting with the
tenant ID, and those are used very effectively as well.

I never worry about data locality.

Depending on your data distribution, you may want to consider table
partitions based on the tenant id. I personally never bother with
that, but split based on some other key in the data.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general