Re: Staging Database

David Johnston <polobo@xxxxxxxxx> · Wed, 7 Aug 2013 08:23:15 -0700 (PDT)

BladeOfLight16 wrote
> The company I work for has a client who has expressed interest in having a
> staging database for their data. Staging as in they add data, do some
> QCing, then push to a database with an identical schema to make it public.
> Fortunately, we're not doing that right now, but it's something they may
> want later on. So I wanted to at least start thinking about it. What are
> some ways of going about it? Some kind of replication set up comes to
> mind,
> but it would need to be on demand, not constant. It also could be as
> infrequent as quarterly, if that makes a difference. Preferably, it would
> only require one server with PostgreSQL installed, but I'm not ruling out
> multiple servers.

My first option to evaluate would be simple schema delineation.  Have one or
more schemas inside the same database where you place "staging" data.  Have
specific "staging" users that can only see those schemas and can perform the
work on them.  Once all the staged data has been
validation/modified/whatever you have a function that will migrate the
staging data from the staging schemas to the production/live schemas.

If you want to use only one server than whatever you are going to do with
the staging data cannot overly tax said box otherwise that requirement is
bogus.  Otherwise two boxes means at least a second postgreSQL installation
on the second box.  I'm out of my experience here but to then get the data
from one box to the other you can use something like dbLink/FDW or even
dumping the data to CSV and re-importing it in production (ideally still
using the same staging tables; you have the second server with the staging
tables also so you can perform resource-intensive actions).

Normally you are only altering a small portion of the data inside the
database and so should attempt to only "manage" that subset.  One question
is once all of the data has been placed into production what happens to it? 
Would the "new" staging area contain ALL information even that previously
imported and unchanged or does it only reflect changes from the last
production load?  How and what kind of QC is going to be performed and will
you need to build interfaces to support those activities?  Would you end up
QCing the same data time-after-time even if it was unchanged.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Staging-Database-tp5766603p5766665.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general