Search Postgresql Archives

Re: Anonymized database dumps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 19.03.2012 um 13:22 schrieb Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>:

> In response to Janning Vygen <vygen@xxxxxxxxxxx>:
>> 
>> I am working on postgresql 9.1 and loving it!
>> 
>> Sometimes we need a full database dump to test some performance issues 
>> with real data.
>> 
>> Of course we don't like to have sensible data like bunches of e-mail 
>> addresses on our development machines as they are of no interest for 
>> developers and should be kept secure.
>> 
>> So we need an anonymized database dump. I thought about a few ways to 
>> achieve this.
>> 
>> 1. Best solution would be a special db user and some rules which fire on 
>> reading some tables and replace privacy data with some random data. Now 
>> doing a dump as this special user doesn't even copy the sensible data at 
>> all. The user just has a different view on this database even when he 
>> calls pg_dump.
>> 
>> But as rules are not fired on COPY it can't work, right?
>> 
>> 2. The other solution I can think of is something like
>> 
>> pg_dump | sed > pgdump_anon
>> 
>> where 'sed' does a lot of magical replace operations on the content of 
>> the dump. I don't think this is going to work reliable.
>> 
>> 3. More reliable would be to dump the database, restore it on a 
>> different server, run some sql script which randomize some data, and 
>> dump it again. hmm, seems to be the only reliable way so far. But it is 
>> no fun when dumping and restoring takes an hour.
>> 
>> Does anybody has a better idea how to achieve an anonymized database dump?
> 
> I highly recommend #3.  It's how we do it where I work.
> 
> At first it seems like a big, slow, complicated monster, but once you've
> built the tools and have it running reliably it's very nice.  Our system
> does the dumps overnight via cron (we have over 100 production databases)
> then changes the sensitive data, as well changing all the passwords to
> "password" so developers can easily log in as any account.  During the
> day, the developers have access to all the sanitized dump files and can
> use them to make as many testing databases as they need.  Yes, the data
> gets up to 24 hours out of date, but it's never been a problem for us.

Thanks for your response and your insights to your process. Sounds reasonable. 

Regards
Janning
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux