On Oct 25, 2007, at 10:30 AM, Jean-David Beyer wrote:
I have just changed around some programs that ran too slowly (too
much time
in io-wait) and they speeded up greatly. This was not unexpected,
but I
wonder about the limitations.
By transaction, I mean a single INSERT or a few related INSERTs.
What I used to do is roughly like this:
for each file {
for each record {
BEGIN WORK;
INSERT stuff in table(s);
if error {
ROLLBACK WORK
}
else {
COMMIT WORK;
}
}
}
The speedup was the obvious one:
for each file {
BEGIN WORK;
for each record {
INSERT stuff in table(s);
}
if error {
ROLLBACK WORK
}
else {
COMMIT WORK;
}
}
This means, of course, that the things I think of as transactions
have been
bunched into a much smaller number of what postgreSQL thinks of as
large
transactions, since there is only one per file rather than one per
record.
Now if a file has several thousand records, this seems to work out
just great.
But what is the limitation on such a thing? In this case, I am just
populating the database and there are no other users at such a
time. I am
willing to lose the whole insert of a file if something goes wrong
-- I
would fix whatever went wrong and start over anyway.
But at some point, disk IO would have to be done. Is this just a
function of
how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or
does it
have to do with wal_buffers and checkpoint_segments?
You're reading data from a file and generating inserts? Can you not
use COPY? That would be the most performant.
Erik Jones
Software Developer | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster