Johannes. On Tue, May 12, 2020 at 8:05 PM Johannes Linke <johannes.linke@xxxxxxxxx> wrote: > since 9.4, VACUUM FREEZE just sets a flag bit instead of overwriting xmin with FrozenTransactionId [1]. This makes it harder to build applications with a focus on data reduction. > We have an app that lets people anonymously vote on stuff exactly once. So we save the vote in one table without any explicit connection to the voting user, and separate from that a flag that this person gave their vote. That has to happen in the same transaction for obvious reasons, but now the xmin of those two data points allows to connect them and to de-anonymize the vote. > We can of course obfuscate this connection, but our goal is to not keep this data at all to make it impossible to de-anonymize all existing votes even when gaining access to the server. The best idea we had so far is more of a workaround: Do dummy updates to large parts of the vote table on every insert so lots of tuples have the same xmin, and them VACUUMing.[2] And even without the xmin someone could cump ctid and correlate them if you are not careful. You problem is going to be hard to solve without taking extra steps. I think doing a transaction which moves all the votes for period ( using insert into with the result of a delete returning ) and then inserts them back ( with some things like a insert into of a select order by random ) may work ( you may even throw a shuffled flg along the way ). An then throw in vacuum so next batch of inserts overwrites the freed space. But for someone with the appropiate access to the system, partial deanonimization is possible unless you take very good measures. Think of it, here in spain we use ballot boxes. But voter order is recorded ( they do double entry check, you get searched in an alphabetic list, your name is copied on a time ordered list, and your position on the list recorded in the alphabetic one, all in paper, nice system, easy to audit, hard to cheat ). If you can freeze time, you can carefully pick up votes from the box and partially correlate them with the list, even with boxes much larger than the voting envelopes they tend to stack with a nice order. And this is with papers, computers are much better on purposelessly ordering everything because it is easier to do it this way. > Does anyone have a suggestion better than this? Is there any chance this changes anytime soon? Should I post this to -hackers? Something which may be useful is to use a stagging table for newly inserted votes and move them in batches, shuffling them, to a more permanent one periodically, ad use a view to joing them. You can even do that with some fancy partiotioning and an extra field. And move some users already-voted flags too, on a different transaction. Doing some of these things and adding some old votes to the moving sets should make the things difficult to track, but it all depends on how hard your anonimization requirements are ( I mean, the paper system I've described leaves my vote perfectly identificable when I've just voted, but it is regarded as a non issue in general, and I suspect any system you can think leaves the last vote identifiable for a finite amount of time ). In general, move data around, in single transactions so you do not lose anything, like shaking a ballot box periodically ( but ensure the lid is properly taped first ). Francisco Olarte.