Re: COPY Performance

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Mon, 5 May 2008 09:01:21 -0600

On Mon, May 5, 2008 at 6:18 AM, Hans Zaunere <lists@xxxxxxxxxxx> wrote:
> > > We're using a statement like this to dump between 500K and >5 million
>  > > rows.
>  >
>  > > COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
>  > >   TO '/dev/shm/SomeFile.csv'
>  >
>  > > Upon first run, this operation can take several minutes.  Upon second
>  > > run, it will be complete in generally well under a minute.
>  >
>  > Hmmm ... define "first" versus "second".  What do you do to return it
>  > to the slow state?
>
>  Interesting that you ask.  I haven't found a very reliable way to reproduce
>  this.
>
>  Typically, just waiting a while to run the same query the second time will
>  reproduce this behavior.  I restarted postgresql and it was reproduced as
>  well.  However, I can't find a way to flush buffers/etc, to reproduce the

what happens if you do something like:

select count(*) from (select ...);

i.e. don't make the .csv file each time.  How's the performance
without making the csv versus making it?