Re: Clustered table order is not preserved on insert

"Dann Corbit" <DCorbit@xxxxxxxxx> · Wed, 26 Apr 2006 13:29:29 -0700

> -----Original Message-----
> From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-
> owner@xxxxxxxxxxxxxx] On Behalf Of Andrus
> Sent: Wednesday, April 26, 2006 1:05 PM
> To: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re: [GENERAL] Clustered table order is not preserved on
insert
> 
> >> I want to duplicate report so that id order is preserved.
> >
> > Tables aren't ordered by definition.
> 
> From CLUSTER docs:
> 
> "When a table is clustered, it is physically reordered based on the
index
> information. "

The next sentence after that is rather important.
From:
http://www.postgresql.org/docs/8.1/interactive/sql-cluster.html
We have this:
"When a table is clustered, it is physically reordered based on the
index information. Clustering is a one-time operation: when the table is
subsequently updated, the changes are not clustered. That is, no attempt
is made to store new or updated rows according to their index order. If
one wishes, one can periodically recluster by issuing the command
again."

This is typical behavior for other database systems as well.

The only way to be totally sure that rows come back in order is to add
an ORDER BY clause to the SQL query.

There is no additional overhead to adding the command.  The data is
clustered already (in which case there are no unexpected page splits) or
it isn't.  If it isn't you need to follow the page split and get the
right data.  If it is clustered, you will just read pages serially all
the way to the end.

I don't know if it is still true, but you used to have to execute
"UPDATE STATISTICS" on a SQL*Server table if you wanted to be sure that
the pages were physically ordered along the unique clustered index
correctly (I guess that it is still true, but I have not checked the
facts).  So it is normal behavior for a database to act in this way.
When you think about it, really there is no way around it.  If you
reorganize the whole table every time there is a page split, the whole
performance objective of clustering would become a disaster instead of a
benefit.

> > If you want to get results back in a particular order use ORDER BY,
> > possibly wrapped in a view.
> 
> Using wrapper view would be excellent idea!  Thank you.
> However, I have some hundred of columns in report table.
> It is very tedious to list all those columns in view definition.
> 
> How to create a wrapper view so that it returns all columns except id
> column
> without listing all columns in SELECT clause ? How to implement EXCEPT
> COLUMNS clause like:
> 
> CREATE TEMP VIEW reportwrapper AS
> SELECT  * EXCEPT COLUMNS (id, reportname),
>    'newr' as reportname
> FROM reports
> WHERE reportname='oldr'
> ORDER BY id
> 
> INSERT INTO reports SELECT * FROM reportwrapper;
> 
> 
> > If you really want to do this, then try something like
> >
> > -- Don't drop the id column
> > UPDATE tempreport SET ...;
> > INSERT INTO report SELECT reportname,... FROM tempreport ORDER BY
id;
> > DROP TABLE tempreport;
> 
> Since id is primary key field, I got duplicate primary key error when
it
> is
> not dropped!
> 
> How to preserve order ant let postgres to generate primary keys
without
> adding extra order field?
> 
> > Although the solution I describe should work it's still not a good
idea.
> > The reason you are having this problem is that you are trying to do
two
> > things with one column. You are using "id" as a unique ID number and
> also
> > as a sort order. If you have a separate sort_order this will let you
> > duplicate reports as you desire and also allow you to re-arrange
reports
> > without changing their IDs.
> 
> Thank you. This is good explanation.
> However, this ill-designed structure is used in a lot of different
sites
> and
> now it suddenly stops working.
> Table stucture change requires re-writing parts of code, testing,
> debugging
> and creating conversion routines from previous table version.
> 
> So I'll prefer some other solution if possible.
> 
> > Can I recommend getting a book or two on relational theory - "An
> > Introduction to Database Systems" by Date is widely available.
> 
> Is it possible to read this form internet?
> I have read Joel  Celkos book "SQL Programming Style".
> Among other things Joel wrote  that every table must have only natural
> primary keys. No surrogates, no ids.
> There was no discussion about this in Joel's  book
> 
> Andrus.
> 
> 
> 
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings