On Thu, May 3, 2012 at 8:42 AM, David Welton <davidw@xxxxxxxxxxx> wrote: > Hi, > > We have a situation where HIPAA data that needs to be encrypted. > Since we have lots of users, and a number of users who access the data > of different people, we cannot simply encrypt the disk and call it > good - it's not fine-grained enough. > > So far, we've been encrypting each row, and that actually works out > fairly well, but now that we need to do searching and sorting, things > have naturally become a bit more difficult... > > I've been testing a few different solutions with our data, which > shouldn't exceed 10,000 rows or thereabouts, in terms of what needs to > be encrypted/decrypted/searched and sorted. > > I prototyped something like this: > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2009/w.harrower.pdf > in Ruby (we're using Rails), and the performance is pretty good (well, > insertion is pretty slow, but that's ok for us), and also allows us to > search for substrings. > > However, that did nothing about sorting. So the next idea was to do > something like this: save a list of names (the data we're storing that > must be encrypted) and database row id's as a Ruby list, and encrypt > that encrypt(marshal([list ... of ... names])). The advantage over > having to decrypt single rows is that it seems to be a lot faster to > decrypt one big chunk of data rather than lots of little things. > Searching through a list of N thousand names is actually fairly quick > in Ruby, as is sorting. So... this would probably work, but it's > pretty gross as a solution in that we're going to have to manually > keep a lot of data synced, and it feels awfully strange to be doing > everything in the application. > > However, I can't think of a way to create an "index" like that in > Postgres, either. Am I overlooking something? The trick, I think, is > to keep the encrypt/decrypt operations to a minimum even if that > requires encrypting/decrypting a lot of data at once. Perhaps > something like decrypting to a temporary table, running the queries I > need, and then dumping and encrypting the table back to its binary > field? When you say 'encrypting each row', what does that mean exactly? Specifically, where is the encryption key and is it available for use in the database? If it isn't, which is naturally more secure, then sorting on the database in natural order is going to be difficult or impossible. Indexed lookups are doable though. Partial key lookups *might* be possible if you used CBC mode and a static initialization vector (which is terribly insecure). merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general