Hi Seb: On Thu, May 1, 2014 at 8:50 PM, Seb <spluque@xxxxxxxxx> wrote: > Thanks, I'm glad to hear you've used this approach successfully. Well, this is always successful if you are able to develop a moderately complex script. > It seems as though the best solution is to do a single SELECT to get the > data out of the server (it is a view with a very complex query plan > joining several other similar views), and then pipe the output through > say awk to break down into chunks for writing the files, as you > describe. If you've got enough disk in the target machine ( several Gb, even a hundred, shouldn't be a problem with todays, or even yesterdays machines ) for two copies of the data, I'll recommend to just do a COPY of the view to a file, then process the file. This lets you do very simple scripts to chunk it, and normally workstation disk is way cheaper than servers. What I would normally do from what you describe is to spool the whole table ( maybe gzipping it along the way if it's real big, in my experience some current machines ( fast multicores with not so fast disks ) are faster gziping and reading/writing a third of the data ( easily achievable with gzip if your data are numbers and timestamps ) than writing the full set, and then make a set of scripts which work on it. I do not grok awk ( I began using perl in the 4.019 era and substituted awk/sed with it for all kind of one liners and similars ), but with perl, on a moderate machine ( like the one I'm using, core duo with 2Gb, quite old by today standards ) you can filter the data and extract it to several hundreds files at nearly full disk speed. As I commented before, if your query results exhibit some locality on the key, you can open several files keeping an LRU cache and split it in one go. I wouldn't try to force order in the db, I've found disk sorts are better for this kind of problems. Ah, another hint. I've had to make somehow similar tasks ( dumping a big query which joins a lot ) in the past. In some of this cases the result was sweeping and concatenating a slew of really big tables while joining a lot of small ones, something like sweeping 20Gb while joining it with 15 more totalling 512Mb among them, generating a 100Gb denormalized result. In these cases I developed a program which just slurped all the small tables into RAM and did the join before writing, which greatly improved the speed ( as, among other things, Postgres only had to send me 20.5Gb over the net ). Sometimes you can use things like this to improve performance as WS RAM is sometimes plentiful and cheap, as you have all of it for a single task, while db server ram is scarcer. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general