On 2020-03-23 17:18:45 -0700, pinker wrote: > Christopher Browne-3 wrote > > Well, you're paying for a lot of overhead in that, as you're > > establishing a psql command, connecting to a database, spawning a > > backend process, starting a transactions, committing a transaction, > > closing the backend process, disconnecting from the database, and > > cleaning up after the launching of the psql command. And you're > > doing that 500 million times. > > > > The one thing I left off that was the loading of a single tuple into > > json_parts. [...] > > Reducing that overhead is the single most important thing you can do. > > Yes, I was thinking about that but no idea now how to do it right now. Do you know any programming language (Python, Perl, ...)? You'll probably get a huge gain from writing a script that just opens the connection once and then inserts each file. Copy usually is even faster by a fair amount, but since you have to read the data for each row from a different file (and - if I understood you correctly, a remote one at that), the additional speedup is probably not that great in this case. Splitting the work int batches and executing several batches in parallel probably helps. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature