Re: How to synchronize tables in remote (production) and local databases in case if the structure of local database's table has been modified

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Fri, 10 Oct 2014 15:40:03 -0500

On 10/8/14, 2:49 AM, Vitaly Isaev wrote:
Hello,
I have faced with a complicated case of table synchronisation. The full description of the problem (with some figures) is posted here: http://stackoverflow.com/questions/26237661/postgresql-update-table-with-new-records-from-the-same-table-on-remote-server
Here it the partial repost of my stackowerflow's topic:

    We have a PostgreSQL server running in production and a plenty of workstations with an isolated development environments. Each one has its own local PostgreSQL server (with no replication with the production server). Developers need to receive updates stored in production server periodically.
    I am trying to figure out how to dump the contents of several selected tables from server in order to update the tables on development workstations. The biggest challenge is that the tables I'm trying to synchronize may be diverged (developers may add - but not delete - new fields to the tables through the Django ORM, while schema of the production database remains unchanged for a long time).

    Therefore the updated records and new fields of the tables stored on workstations /must be/ preserved against the overwriting.

    I guess that direct dumps (e.g. pg_dump -U remote_user -h remote_server -t table_to_copy source_db | psql target_db) are not suitable here.

    *UPD*: If possible I would also like to avoid the use of third (intermediate) database while transferring the data from production database to the workstations.

Have no idea how to work it out. Any help will be appreciated.
Sincerely,

The way I've handled this in the past is to use pg_dump -a -t. That should be robust against added columns in the destination.

My overall recommendation though is *never put data in a dev database that you can't recreate*. If you need test data, create a script that generates that data for you.

It's basically guaranteed that you'll sometimes need to blow away dev databases and start over, so it's best to just build the infrastructure to support that and be able to do it as often as you want.

BTW, there is a trick you can use to speed this process up. If you create a database that has most everything you need in it, you can then use that as a template for creating new databases via createdb -T. That is MUCH faster than manually inserting data. So what I'd suggest is something like:

createdb master
<load production schema and necessary data into master>
createdb -T master dev

... do development work

When you need to rebuild your dev database you would then do:

dropdb dev
createdb -T master dev

I actually kept multiple master copies around, based on the date of the dump from production. It was sometimes useful for testing things. I also kept the production dumps in version control (note that I as careful not to dump any sensitive data).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general