Re: strategies for segregating client data when using PostgreSQL in a web app

"David Johnston" <polobo@xxxxxxxxx> · Fri, 3 Aug 2012 18:25:36 -0400

From: pgsql-general-owner@postgresql> .org
[mailto:pgsql-general-owner@postgresql> .org] On Behalf Of Menelaos
PerdikeasSemantix
Sent: Friday, August 03, 2012 4:05 PM
To: pgsql-general@postgresql> .org
Subject:  strategies for segregating client data when using
PostgreSQL in a web app

I would like to know what are the best practices / common patterns (or
pointers to such) for using PostgreSQL in the context of a "big" web
application with substantial data per user> .

Namely, we are designing an ERP / accounting / business intelligence Web
Application where each client company will have its own substantial data> .
The application will be fronted by a JBoss Application Server with
PostgreSQL at the back> . We are targeting for a few thousand clients at the
maximum, and that after 2-3 years at the earliest> . I understand that there
could be several approaches> . In the following I am assuming we are running
only one PostgreSQL server instance (process) or perhaps a few (3-4) in a
cluster but I don't suppose that affects much the  options below> . So, I
see the following options:

[1] use just one database and schema and logically segregate companies data
by having all tables have a client_id column as part of their primary key> .
[2] use multiple database (in the same server instance) and only the public
schema in each of them for the customer's data> .
[3] use one database and multiple schemas to separate the different
customer's data> .

(the [2] and [3] in particular seem practically indistinguishable to me)> .

What are the trade-offs in terms of:

[1] enforcing security and access separation
[2] administering the database and responding to inquiries like "please
reset my company's data to the image of yesterday cause we messed up some
tables" or "we are taking our business elsewhere, can we please have a dump
of our data?" or "we would like a weekly DVD with our data"> .
[3] backup / restore and partitioning
[4] potential for connection pooling at the Application Server> .

Any information, even pointers for further study will be greatly
appreciated> .
-- Menelaos> .
============================================================================
============

One approach I have been considering is:

1) Primary Database for "global" information
2) Per-Client Databases for "local" information + Maintain a local cache of
whatever "global" information is needed
2a) Read-Only Client Database Slaves
3) Auxiliary Database to store client data that wants to be consolidated
with other clients

There are a lot of considerations and trade-offs that need to be evaluated.

Level of client access
Level of third-party access
Desirability of on-premise
Your resources and timeframe

Generally I would suggest identifying different types of data/owners and
keep their tables in separate schemas.  Additionally I would allow for the
possibility of multiple "clients" having data on the same physical tables.

Beyond that decide what kind of client/internal meta-data is going to be
necessary to keep to organize "modules".

David J.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general