Not at all. He wanted a way to influence the on-disk order of the rows
in a table, so he could make sure they were not left in the order of
insertion. That should be possible with CLUSTER. From the documentation:
"When a table is clustered, it is physically reordered based on the
index information" (again, see
http://www.postgresql.org/docs/9.0/static/sql-cluster.html for full
details).
Now, if one would give CLUSTER an index that put the data back in
insertion order, then it wouldn't be very useful (obviously). On the
other hand, if there is such an index, then the insertion order of the
rows can easily be restored no matter how they are reordered... (making
the whole thing pointless in this case).
So, what index to choose? He can either cluster on an existing index, if
that would remove all trace of the insertion order (any field with
unique values would work equally well), or he can add a field with
random values, index that field and cluster on that index. CLUSTER is
just the tool... Unless I've missed something fundamental about the
CLUSTER command it should be perfectly possible to reorder the data on
disk in any way you want with it, if you first create an appropriate
index to cluster on.
The downsides I see is (1 & 2 directly from the documentation):
1. When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired
on it.
2. You need free space on disk at least equal to the sum of the table
size and the index sizes.
3. Rows inserted after CLUSTER finishes will not be "protected". You
would need to run it before a dump and make sure nothing is written to
the table between the clustering and the dumping, if you want to be
really sure...
Any reason why this wouldn't work?
On 2010-12-13 14:53, Vick Khera wrote:
On Mon, Dec 13, 2010 at 4:26 AM, Hampus Wessman
<hampus.wessman@xxxxxxxxx> wrote:
CLUSTER should be able to do that for you. See
http://www.postgresql.org/docs/9.0/static/sql-cluster.html.
That's the exact opposite of what he wished to accomplish.
The only thing I can suggest is to do random order of updates, perhaps
while also setting the fill factor for the pages to something large so
there's no room to re-use a page, causing more shuffling.
I suspect you'll have to do this for every table you've got, else you
could leak some information about what rows were updated since the
last import if the person has access to the file used to import the
data originally. You'll also have to do this every time you dump the
DB, I'd suspect...
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general