Am 19.03.2012 um 13:22 schrieb Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>: > In response to Janning Vygen <vygen@xxxxxxxxxxx>: >> >> I am working on postgresql 9.1 and loving it! >> >> Sometimes we need a full database dump to test some performance issues >> with real data. >> >> Of course we don't like to have sensible data like bunches of e-mail >> addresses on our development machines as they are of no interest for >> developers and should be kept secure. >> >> So we need an anonymized database dump. I thought about a few ways to >> achieve this. >> >> 1. Best solution would be a special db user and some rules which fire on >> reading some tables and replace privacy data with some random data. Now >> doing a dump as this special user doesn't even copy the sensible data at >> all. The user just has a different view on this database even when he >> calls pg_dump. >> >> But as rules are not fired on COPY it can't work, right? >> >> 2. The other solution I can think of is something like >> >> pg_dump | sed > pgdump_anon >> >> where 'sed' does a lot of magical replace operations on the content of >> the dump. I don't think this is going to work reliable. >> >> 3. More reliable would be to dump the database, restore it on a >> different server, run some sql script which randomize some data, and >> dump it again. hmm, seems to be the only reliable way so far. But it is >> no fun when dumping and restoring takes an hour. >> >> Does anybody has a better idea how to achieve an anonymized database dump? > > I highly recommend #3. It's how we do it where I work. > > At first it seems like a big, slow, complicated monster, but once you've > built the tools and have it running reliably it's very nice. Our system > does the dumps overnight via cron (we have over 100 production databases) > then changes the sensitive data, as well changing all the passwords to > "password" so developers can easily log in as any account. During the > day, the developers have access to all the sanitized dump files and can > use them to make as many testing databases as they need. Yes, the data > gets up to 24 hours out of date, but it's never been a problem for us. Thanks for your response and your insights to your process. Sounds reasonable. Regards Janning -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general