Search Postgresql Archives

Re: pg_dump order of rows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Not at all. He wanted a way to influence the on-disk order of the rows in a table, so he could make sure they were not left in the order of insertion. That should be possible with CLUSTER. From the documentation: "When a table is clustered, it is physically reordered based on the index information" (again, see http://www.postgresql.org/docs/9.0/static/sql-cluster.html for full details).

Now, if one would give CLUSTER an index that put the data back in insertion order, then it wouldn't be very useful (obviously). On the other hand, if there is such an index, then the insertion order of the rows can easily be restored no matter how they are reordered... (making the whole thing pointless in this case).

So, what index to choose? He can either cluster on an existing index, if that would remove all trace of the insertion order (any field with unique values would work equally well), or he can add a field with random values, index that field and cluster on that index. CLUSTER is just the tool... Unless I've missed something fundamental about the CLUSTER command it should be perfectly possible to reorder the data on disk in any way you want with it, if you first create an appropriate index to cluster on.

The downsides I see is (1 & 2 directly from the documentation):
1. When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. 2. You need free space on disk at least equal to the sum of the table size and the index sizes. 3. Rows inserted after CLUSTER finishes will not be "protected". You would need to run it before a dump and make sure nothing is written to the table between the clustering and the dumping, if you want to be really sure...

Any reason why this wouldn't work?

On 2010-12-13 14:53, Vick Khera wrote:
On Mon, Dec 13, 2010 at 4:26 AM, Hampus Wessman
<hampus.wessman@xxxxxxxxx>  wrote:
CLUSTER should be able to do that for you. See
http://www.postgresql.org/docs/9.0/static/sql-cluster.html.

That's the exact opposite of what he wished to accomplish.

The only thing I can suggest is to do random order of updates, perhaps
while also setting the fill factor for the pages to something large so
there's no room to re-use a page, causing more shuffling.

I suspect you'll have to do this for every table you've got, else you
could leak some information about what rows were updated since the
last import if the person has access to the file used to import the
data originally.  You'll also have to do this every time you dump the
DB, I'd suspect...



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux