I am looking for suggestions in setting up a large postgres database scenario. We are running a
science project with a lot of data expected from the science instrument. If you have time to comment,
any advice is most welcome!
Here's the deal:
1. We expect to store ~1TB per year of data for 10 years. Mostly this is a warehouse situation - not a
lot of updates, or deletes, but a lot of inserts.
2. We need to replicate a subset of our postgres data to an international science community, and in
turn take in some data from them (we plan to do this via Slony-1 unless there is a compelling reason
not to).
3. We need to make a copy of our database available to the general internet community.
4. We need to have a Warm Standby available in case of disaster. We plan to use PITR with WAL files
for this (again, unless there is a compelling reason not to).
5. We need to make regular full tape backups (~weekly) and occasionally scheduled maintenance (think
quarterly maintenance).
We do not have an endless budget, sadly, so I could use some help as to how to go about this. Having
gone from a job where my database software actually had paid tech support to one that doesn't
(PostGres), I am pretty concerned about what could go wrong.
Assume our Primary server (A) is good enough to serve our in-house users, and our Warm Standby (B) is
a physical duplicate of A. My plan is to copy WAL files to B. Make a tape backup from B weekly,
keeping it out of recovery mode for ~6 hours, or alternatively make a snapshot of B's data files at a
given time and tape off the snapshot. This takes care of A & B, and the backups, but what about the
other requirements?
How do we get data to our web community w/out fear of hacking to the primary? And how do we do that
economically? There is one plan in place to use a set of snapshot disks from A's data files to act as
the web database's files. Can we do that? Is that exceptionally stupid? Another plan involves using a
Solaris 10 ZFS solution to clone the warm standby B's files to act as a web database's files (see:
http://www.lethargy.org/~jesus/archives ... crack.html for more). I am not sure either one of the
above solutions will work quickly. We'd like a turnaround time from A to B to Web of less than 30
minutes for newly-created tables, or new data in existing tables.
Lastly, we plan to pinhole our firewall for trusted Slony-1 science "customers". People that we
already know who have specific IP addresses. We have yet to figure out the drag to our Primary (A) due
to Slony-1. Any experience with that out there?
My prior work experience involves a 1TB Sybase database, its warm-standby and regular backups &
quarterly maintenance. I am new to PostGres and the idea of no tech support phone calls when things
break is a scary one! I am trying to create a belt-and-suspenders redundant solution so that if
something breaks, I have time to figure out what went wrong and fix it before the users even know
there's a problem.
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/