Re: planned recovery from a certain transaction

Chris Spotts <rfusca@xxxxxxxxx> · Thu, 25 Jun 2009 19:00:25 -0500

> 
> Assuming that the data is mostly created from whole cloth each
> morning, it might do to have two dbs, and rename one to replace the
> other when you're ready.  Gives you 20 or so hours to discover a screw
> up and still have the backup db before you toss it away to build the
> next day
For this database, we're dumping in the neighborhood of 75GB of data
each morning.  This is mostly data warehousing.
I'm not quite sure how effective its going to be to do a file system
snapshot since the whole db is so large.

The aborted transaction in the pg_clog sounds similar to what I was
dreaming of, but "very hard to do" doesn't sound real promising.  

Here's the approach we've come up with today to help combat it and let
me know you're thoughts on it.

We're mostly  warehousing of logistics data.  So we have an import table
that the files are uploaded to.  We have a set of tables for today's
data processing, a set of tables for the results of yesterday (we do
some crossday day comparisions that make it needed), and a set of
partitioned archive tables.

The data is uploaded into the import table, the previous day tables are
moved to history, and then the current day tables are moved to previous
day, leaving the current day tables ready to be populated.

The import table is s then normalized into several current day tables,
mostly split into dynamic and static data.  Some fields are normalized
to reduce size, ie we store a city_id int instead of a 60 character
field.  Reports and stats are generated off the current days records and
the comparision of the two days. 

Because we calculate certain things like durations of certain statuses,
its not as simple as just "delete everything with a certain date and
then copy the appropriate dates from history into current and previous
respectively." We effectively alter some fields that cannot be
"unaltered" short of a restore of some kind.  Due to speed concerns,
we've not found a way around this.  We'd like to store the needed fields
to calculate the durations after end of the duration, but that can be
quite lengthy.  Instead triggers update a table that tracks key events.

This all is what we currently do.
This means that we'd have to have a HUGE filesystem snapshot to do it
like that.  To do a pg_dump restore, we have to restore one of the
partitions that is 250GB (2 weeks of data partitioned) - its a lengthy
process.

We're looking at adding a set of daily queue tables that sits right
before archive.  Writing the files to these tables and then if
everything checks out, dump to history.  This way, we should only ever
have to have a dump of the current, previous, and queue tables to be
able to "undo".  Suggestions?
Attachment:
signature.asc

Description: This is a digitally signed message part