Re: Anonymized database dumps

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> · Mon, 19 Mar 2012 17:55:27 -0400

In response to Kiriakos Georgiou <kg.postgresql@xxxxxxxxxxxxxx>:

> The data anonymizer process is flawed because you are one misstep away from data spillage.

In our case, it's only one layer.

Other layers that exist:
* The systems where this test data is instantiated can't send email
* The systems where this exist have limited access (i.e., not all
  developers can access it, and it's not used for typical testing --
  only for specific testing that requires production-like data)

You are correct, however, in that there's always the danger of
spillage if new sensitive data is added and the sanitation script
is not properly updated.  It's part of the ongoing overhead of
maintaining such a system.

> Sensitive data should be stored encrypted to begin.  For test databases you or your developers can invoke a process that replaces the real encrypted data with fake encrypted data (for which everybody has the key/password.)  Or if the overhead is too much (ie billions of rows), you can have different decrypt() routines on your test databases that return fake data without touching the real encrypted columns.

The thing is, this process has the same potential data spillage
issues as sanitizing the data.  I find it intriguing, however, and
I'm going to see if there are places where this approach might
have advantages over our current one.

Since much of our sensitive data is already de-identified, it
provides an additional level of protection on that level as well.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general