On Oct 18, 2006, at 5:20 AM, Ilja Golshtein wrote:
When starting a database from scratch it is much faster to import the
data and then create the indexes. The time to create index on a full
table is less than the extra time from each index update from the
inserts. The more indexes to update the more time updating indexes
takes.
The problem with a live database is removing the indexes slows down
current users and if you are adding 2,000 rows to a table that
already
has 5,000,000 rows in it then you will loose the benefit.
I am 100% agree with you. What you are describing is a very good
and useful technique for some maintenance operations.
My current goal is to increase performance in normal [almost ;)]
OLTP mode of my application, so removing indexes for some time is
not an option here.
And my question remains.
Is it Ok to use COPY BINARY FROM STDIN instead of multiple INSERTs?
I don't think I would use BINARY, it seems likely to be susceptible
to changes in the underlying data type storage. From the docs:
"To determine the appropriate binary format for the actual tuple data
you should consult the PostgreSQL source, in particular the *send and
*recv functions for each column's data type (typically these
functions are found in the src/backend/utils/adt/ directory of the
source distribution)."
Regular text COPY is quite a big performance win over INSERTs, and
doesn't require your application to know the vagaries of the data
storage. Also, if you have many indices, time to update them will
probably dominate anyhow, making the difference between binary and
text copy negligible.
A was mentioned, COPY can only insert static data, and does not
support rules (that you might use to support constraint exclusion,
etc). AFIAK, the locking semantics are the same as INSERT, i.e., it
does not lock the entire table or anything like that. Since it would
allow transactions that insert data to finish faster, it should
actually work better under high concurrency.
-Casey