From: pgsql-general-owner@postgresql> .org [mailto:pgsql-general-owner@postgresql> .org] On Behalf Of Menelaos PerdikeasSemantix Sent: Friday, August 03, 2012 4:05 PM To: pgsql-general@postgresql> .org Subject: strategies for segregating client data when using PostgreSQL in a web app I would like to know what are the best practices / common patterns (or pointers to such) for using PostgreSQL in the context of a "big" web application with substantial data per user> . Namely, we are designing an ERP / accounting / business intelligence Web Application where each client company will have its own substantial data> . The application will be fronted by a JBoss Application Server with PostgreSQL at the back> . We are targeting for a few thousand clients at the maximum, and that after 2-3 years at the earliest> . I understand that there could be several approaches> . In the following I am assuming we are running only one PostgreSQL server instance (process) or perhaps a few (3-4) in a cluster but I don't suppose that affects much the options below> . So, I see the following options: [1] use just one database and schema and logically segregate companies data by having all tables have a client_id column as part of their primary key> . [2] use multiple database (in the same server instance) and only the public schema in each of them for the customer's data> . [3] use one database and multiple schemas to separate the different customer's data> . (the [2] and [3] in particular seem practically indistinguishable to me)> . What are the trade-offs in terms of: [1] enforcing security and access separation [2] administering the database and responding to inquiries like "please reset my company's data to the image of yesterday cause we messed up some tables" or "we are taking our business elsewhere, can we please have a dump of our data?" or "we would like a weekly DVD with our data"> . [3] backup / restore and partitioning [4] potential for connection pooling at the Application Server> . Any information, even pointers for further study will be greatly appreciated> . -- Menelaos> . ============================================================================ ============ One approach I have been considering is: 1) Primary Database for "global" information 2) Per-Client Databases for "local" information + Maintain a local cache of whatever "global" information is needed 2a) Read-Only Client Database Slaves 3) Auxiliary Database to store client data that wants to be consolidated with other clients There are a lot of considerations and trade-offs that need to be evaluated. Level of client access Level of third-party access Desirability of on-premise Your resources and timeframe Generally I would suggest identifying different types of data/owners and keep their tables in separate schemas. Additionally I would allow for the possibility of multiple "clients" having data on the same physical tables. Beyond that decide what kind of client/internal meta-data is going to be necessary to keep to organize "modules". David J. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general