Interesting point you made about the read to write ratio of 1 to 15.
How frequently will you be adding new entities or in the case of storing the customers in one database table, how frequently will you be adding new objects of a certain entity type. How many entity types do you foresee existing? i.e. "Customer?" Will Customer have subtypes or is a Customer the single entity in the database?
How frequent and for how long are write operations and are they heavily transaction based? Will you need to support complex reporting in the future? What is the max number of customers? And how much data (approximate) will a single customer record consume in bytes? At what rate does it grow? (in bytes)
Will your system need to support any type of complex reporting in the future (despite it being write intensive)?
I'd take a look at memcached, plproxy, pgpool, and some of the other cool stuff in the postgresql community.
At a minimum, it might help you architect the system in such a manner that you don't box yourself in.
Last, KV stores for heavy write intensive operations in distributed environments are certainly interesting - a hybrid solution could work.
Sounds like a fun project!
Bryan
On Fri, Jun 25, 2010 at 7:02 PM, Greg Smith <greg@xxxxxxxxxxxxxxx> wrote:
Kevin Grittner wrote:My guess would be that this app will end up being best split by schema. I wonder whether it *also* needs to be split by database, too. 2000 clusters is clearly a nightmare, and putting all the client data into one big table has both performance and security issues; that leaves database and schema as possible splits. However, having 2000 databases in a cluster is probably too many; having 2000 schemas in a database might also be too many. There are downsides to expanding either of those to such a high quantity.
A schema is a logical separation within a database. Table
client1.account is a different table from client2.account. While a
user can be limited to tables within a single schema, a user with
rights to all the tables can join between them as needed. You could
put common reference data in a public schema which all users could
access in addition to their private schemas
In order to keep both those in the domain where they perform well and are managable, it may be that what's needed is, say, 50 databases with 40 schemas each, rather than 2000 of either. Hard to say the ideal ratio. However, I think that at the application design level, it would be wise to consider each client as having a database+schema pair unique to them, and with the assumption some shared data may need to be replicated to all the databases in the cluster. Then it's possible to shift the trade-off around as needed once the app is built. Building that level of flexibility in shouldn't be too hard if it's in the design from day one, but it would be painful bit of refactoring to do later. Once there's a prototype, then some benchmark work running that app could be done to figure out the correct ratio between the two. It might even make sense to consider full scalability from day one and make the unique client connection info host:port:database:schema.
P.S. Very refreshing to get asked about this before rather than after a giant app that doesn't perform well is deployed.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx www.2ndQuadrant.us
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance