Re: break table into portions for writing to separate files

Seb <spluque@xxxxxxxxx> · Thu, 01 May 2014 13:50:22 -0500

On Thu, 1 May 2014 20:20:23 +0200,
Francisco Olarte <folarte@xxxxxxxxxxxxxx> wrote:

[...]

> As you mention looping and a shell, I suppose you are in something
> unix like, with pipes et al. You can pipe COPY ( either with the pipe
> options for copy, or piping a psql command, or whichever thing you
> like ) through a script which spits ecah data chunk into its
> corresponding file. If your data is somehow clustered into the table (
> by chance or by design ) you don't even need a to sort the data, just
> use an open file pool, I did that once with call files, chunked them
> into day sized files and it worked like a charm ( and if you need the
> files sorted, you can then use sort on each of them, which normally is
> quite fast ).

> For your description of data, with a pipe, you could read a line,
> extract a key for the record ( the timestamp rounded down to 20
> minutes would be a good one ), get and open output file for append (
> using a small caching layer ) , write it.

> Depending on how many files you expect, how many RAM you have and how
> many files your OS allows you to open, other solutions exists. And if
> you do not have enough ram / openfiles / clustering for any of them
> there are multitude of tricks ( if, say, you have 3 years worth, no
> correlation, and can only open/buffer about 1000 files you could split
> from the db into day sized chunks and then split each of them into 20
> minutes ones.

Thanks, I'm glad to hear you've used this approach successfully.  It
seems as though the best solution is to do a single SELECT to get the
data out of the server (it is a view with a very complex query plan
joining several other similar views), and then pipe the output through
say awk to break down into chunks for writing the files, as you
describe.

Cheers,

-- 
Seb

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general