Re: quick question abt pg_dump and restore

"Josh Harrison" <joshques@xxxxxxxxx> · Wed, 9 Jan 2008 11:46:14 -0500

On Jan 9, 2008 11:28 AM, Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> wrote:

On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

> Alvaro Herrera <
alvherre@xxxxxxxxxxxxxxxxx> writes:
> > Josh Harrison escribió:
> >> Fine. I can use order by when I want  to order it in terms of
> >> some columns. But What if I want to maintain the same order as

> >> in the database1? ie., I want my rows of TableABC in Database2
> >> to be the same order as the rows in TableABC in Database 1 ???
>
> > You can't.
>
> According to the SQL standard, a table is an *unordered* collection

> of rows, and the results of any query are produced in an
> unspecified order (unless you use ORDER BY).  The ambiguity about
> row ordering is intentional and is exploited by most DBMSes
> including Postgres to improve implementation efficiency.  If you

> assume there is such a thing as a specific ordering within a table,
> you'll live to regret it eventually.

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?

eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?

Okay. Let me explain this again
Lets say you load the data related to a particular person sequentially into a table in the test  database. This results in all of the data for any one person being
located one or a very few sequential data blocks.  Testing access to
the person's data then reveals access to be very fast.  However, the
data is never loaded in that way in the production database.  It is
almost always spread out across many data blocks within the database,
roughly organized by the date-time in which the data arrived.In this case access to a particular person's data is not as fast as compared to the previous one where the data are located close to each other.

we have this problem when we compare Oracle's performance with postgres since Oracle has index-only scan where it can access the data just from the index when the query involves only indexed columns. But since postgres currently doesn't implement index-only scan and goes to the heap for fetching the tuples it becomes very slow when the data are shuffled

Let me know if it makes sense now
Thanks
josh