Jean-David Beyer wrote: > This means, of course, that the things I think of as transactions have been > bunched into a much smaller number of what postgreSQL thinks of as large > transactions, since there is only one per file rather than one per record. > Now if a file has several thousand records, this seems to work out just great. Using the small transactions, you were limited by the speed your hard disk flush the commit WAL records to the disk. With small transactions like that, it's not about the bandwidth, but latency of the hard drive. Using larger transactions helps because you get more work done on each disk operation. Upcoming 8.3 release will have a feature called "asynchronous commit", which should speed up those small transactions dramatically, if you don't want to batch them into larger transactions like you did: http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT > But what is the limitation on such a thing? In this case, I am just > populating the database and there are no other users at such a time. I am > willing to lose the whole insert of a file if something goes wrong -- I > would fix whatever went wrong and start over anyway. > > But at some point, disk IO would have to be done. Is this just a function of > how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it > have to do with wal_buffers and checkpoint_segments? Well, you have to do the I/O eventually, regardless of shared_buffers. Common wisdom is that increasing wal_buffers from the default helps with bulk loading like that, up to a point. Increasing checkpoint_segments helps as well. After you've done all that, you're going to be limited by either the bandwidth of your I/O system, or the speed of your CPU, depending on your hardware. Using COPY instead of INSERTs will help if it's CPU. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match