Hi: On Thu, May 1, 2014 at 7:50 PM, Seb <spluque@xxxxxxxxx> wrote: > I've been looking for a way to write a table into multiple files, and am > wondering if there are some clever suggestions. Say we have a table > that is too large (several Gb) to write to a file that can be used for > further analyses in other languages. The table consists of a timestamp > field and several numeric fields, with records every 10th of a second. > It could be meaningfully broken down into subsets of say 20 minutes > worth of records. One option is to write a shell script that loops > through the timestamp, selects the corresponding subset of the table, > and writes it as a unique file. However, this would be extremely slow > because each select takes several hours, and there can be hundreds of > subsets. Is there a better way? As you mention looping and a shell, I suppose you are in something unix like, with pipes et al. You can pipe COPY ( either with the pipe options for copy, or piping a psql command, or whichever thing you like ) through a script which spits ecah data chunk into its corresponding file. If your data is somehow clustered into the table ( by chance or by design ) you don't even need a to sort the data, just use an open file pool, I did that once with call files, chunked them into day sized files and it worked like a charm ( and if you need the files sorted, you can then use sort on each of them, which normally is quite fast ). For your description of data, with a pipe, you could read a line, extract a key for the record ( the timestamp rounded down to 20 minutes would be a good one ), get and open output file for append ( using a small caching layer ) , write it. Depending on how many files you expect, how many RAM you have and how many files your OS allows you to open, other solutions exists. And if you do not have enough ram / openfiles / clustering for any of them there are multitude of tricks ( if, say, you have 3 years worth, no correlation, and can only open/buffer about 1000 files you could split from the db into day sized chunks and then split each of them into 20 minutes ones. Regards. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general