Search Postgresql Archives

Re: Breaking up a PostgreSQL COPY command into chunks?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

@Francisco - Yeah, the file is around 600 Mb currently, uncompressed.

You're right, our internet connection is going to be the limiting factor.

Essentially, the PostgreSQL server is in a datacentre, the server we're dumping to is in the office.

Running a script on the PostgreSQL server in the datacentre is going to be tricky (not so much technically, just from a procedures/security point of view).

Dumping to a spare table seems like an interesting point - so we'd just create the table, COPY the results to that table, then use LIMIT/OFFSET to paginate through that, then drop the table afterwards?

Currently, I'm doing a quick hack where we download an ordered list of the ids (auto-incrementing integer) into Python, chunk it up into groups of ids, then use a WHERE IN clause to download each chunk via COPY.

Would dumping to a spare table and paginating a better approach? Reasons? (Not challenging it, I just want to understand everything).

Cheers,
Victor


On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <folarte@xxxxxxxxxxxxxx> wrote:
On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorhooi@xxxxxxxxx> wrote:
> They think that it might be limited by the network, and how fast the
> PostgreSQL server can push the data across the internet. (The Postgres
> server and the box running the query are connected over the internet).

You previously said you had 600Mb. Over the internet. ¿ Is it a very
fat pipe ? Because otherwise the limitng factor is probably not the
speed at which postgres can push the resuts, but he throughput of your
link.

If, as you stated, you need a single transaction to get a 600Mb
snapshot I would recommend to dump it to disk, compressing on the fly
( you should get easily four o five fold reduction on a CSV file using
any decent compressor ), and then send the file. If you do not have
disk for the dump but can run programs near the server, you can try
compressing on the fly. If you have got none of this but have got
space for a spare table, use a select into, paginate this output and
drop it after. Or just look at the configs and set longer query times,
if your app NEEDS two hour queries, they can be enabled. But anyway,
doing a long transaction over the internet does not seem like a good
idea to me.

Francisco Olarte


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux