Search Postgresql Archives

Re: Run statements before pg_dump in same transaction?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


François Beausoleil asked:
> To that end, we'd like to anonymize the data before it leaves the database server.
>
> One solution we thought of would be to run statements prior to pg_dump, but within 
> the same transaction, something like this:
>
> BEGIN;
> UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...;
> -- launch pg_dump as usual, ensuring a ROLLBACK at the end
...
> Is there a ready-made solution for this?

No - at least not with generating a dump and scrubbing *before* the data comes 
out. Some other ideas:

* Periodically do a full dump to another database under your control, sanitize the 
data, and make all dev dumps come from *that* database. Process roughly becomes:

* pg_dump herokudb | psql dev_temp
* <sanitize dev_temp>
* drop existing dev_old; rename devdb to dev_old; rename dev_temp to devdb
* Devs can pg_dump devdb at will

That still moves your sensitive data to another server though, even temporarily.
Another approach is to use the -T flag of pg_dump to exclude certain tables. 
Make modified copies of them on the server, then rename them after the dump 
(or simply put them in a new namespace):

* (create a dev.users identical (including indexes, etc.) to public.users)
* truncate table dev.users;
* insert into dev.users select * from public.users;
* update dev.users set email = 'dev' + || ...etc. <e.g. sanitize data>
* pg_dump -d <herokus DATABASE_URL> -T public.users > devs_use_this.pg

Then dev could do: set schema = dev, public;

Or you could simply move the sanitized table back:

alter table dev.users set schema public;

If you are going to rename, it may be simpler to not make an identical copy 
of the affected tables (i.e. with indexes) but just a data-only copy:

create table dev.users as select * from public.users;
<sanitize dev.users>
pg_dump herokudb --schema-only | psql devdb
pg_dump herokudb --data-only -T public.users | psql devdb
psql devdb -c 'insert into public.users select * from dev.users'


- -- 
Greg Sabino Mullane greg@xxxxxxxxxxxx
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201703240911
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAljVHHsACgkQvJuQZxSWSshUbgCg7TzCkAzT4wKoKd5/2rruzLte
TJcAoI7AvGdGzlNp5b3N+LFJ9DWIZ8/C
=7heB
-----END PGP SIGNATURE-----




-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux