Re: INSERTing lots of data

Szymon Guz <mabewlun@xxxxxxxxx> · Fri, 28 May 2010 11:48:16 +0200

2010/5/28 Joachim Worringen <joachim.worringen@xxxxxxxx>

Greetings,

my Python application (http://perfbase.tigris.org) repeatedly needs to insert lots of data into an exsting, non-empty, potentially large table. Currently, the bottleneck is with the Python application, so I intend to multi-thread it. Each thread should work on a part of the input file.

I already multi-threaded the query part of the application, which requires to use one connection per thread - cursors a serialized via a single connection.

Provided that

- the threads use their own connection

- the threads perform all INSERTs within a single transaction

- the machine has enough resources

 will I get a speedup? Or will table-locking serialize things on the server side?

Suggestions for alternatives are welcome, but the data must go through the Python application via INSERTs (no bulk insert, COPY etc. possible)

Remember about Python's GIL in some Python implementations so those threads could be serialized at the Python level.

This is possible that those inserts will be faster. The speed depends on the table structure, some constraints and triggers and even database configuration. The best answer is: just check it on some test code, make a simple multithreaded aplication and try to do the inserts and check that out.

regards
Szymon Guz