Re: COPY FROM STDIN instead of INSERT

Casey Duncan <casey@xxxxxxxxxxx> · Wed, 18 Oct 2006 10:13:54 -0700

On Oct 18, 2006, at 5:20 AM, Ilja Golshtein wrote:

When starting a database from scratch it is much faster to import the
data and then create the indexes. The time to create index on a full
table is less than the extra time from each index update from the
inserts. The more indexes to update the more time updating indexes  
takes.

The problem with a live database is removing the indexes slows down
current users and if you are adding 2,000 rows to a table that  
already
has 5,000,000 rows in it then you will loose the benefit.

I am 100% agree with you. What you are describing is a very good  
and useful technique for some maintenance operations.

My current goal is to increase performance in normal [almost ;)]  
OLTP mode of my application, so removing indexes for some time is  
not an option here.

And my question remains.
Is it Ok to use COPY BINARY FROM STDIN instead of multiple INSERTs?

I don't think I would use BINARY, it seems likely to be susceptible  
to changes in the underlying data type storage. From the docs:

"To determine the appropriate binary format for the actual tuple data  
you should consult the PostgreSQL source, in particular the *send and  
*recv functions for each column's data type (typically these  
functions are found in the src/backend/utils/adt/ directory of the  
source distribution)."

Regular text COPY is quite a big performance win over INSERTs, and  
doesn't require your application to know the vagaries of the data  
storage. Also, if you have many indices, time to update them will  
probably dominate anyhow, making the difference between binary and  
text copy negligible.

A was mentioned, COPY can only insert static data, and does not  
support rules (that you might use to support constraint exclusion,  
etc). AFIAK, the locking semantics are the same as INSERT, i.e., it  
does not lock the entire table or anything like that. Since it would  
allow transactions that insert data to finish faster, it should  
actually work better under high concurrency.

-Casey