On Tue, 2006-03-21 at 06:46, Edoardo Serra wrote: > Hi all, > I'm having a very strange performance > problems on a fresh install of postgres 8.1.3 > I've just installed it with default option and > --enable-thread-safety without tweaking config files yet. > > The import of a small SQL files into the DB (6 > tables with 166.500 total records, INSERT syntax) > took me more than 18 minutes as shown below > (output of "time ./psql benchmarks < dump.sql") > > real 18m33.062s > user 0m10.386s > sys 0m7.707s > > The server is an > - Intel(R) Xeon(TM) CPU 3.60GHz - 1MB L2 > - 1 GB RAM > - 2x HDD SCSI U320 RAID 1 Hardware (HP 6i controller) > > The same import, tried on an another low-end > server with a fresh install of postgres 8.1.3 gave me: > > real 2m4.497s > user 0m6.234s > sys 0m6.148s Here's what's happening. On the "fast" machine, you are almost certainly using IDE drives. PostgreSQL uses a system call called "fsync" when writing data out. It writes the data to the write ahead logs, calls fsync, and waits for it to return. fsync() tells the drive to flush its write buffers to disk and tell the OS when it has completed this. SCSI drives dutifully write out those buffers, and then, only after they're written, tell the OS that yes, the data is written out. Since SCSI drives can do other things while this is going on, by using command queueing, this is no great harm to performance, since the drive and OS can transfer other data into / out of buffers during this fsync operation. Meanwhile, back in the jungle... The machine with IDE drives operates differently. Most, if not all, IDE drives, when told by the OS to fsync() tell the OS immediately that the fsync() call has completed, and the data is written to the drive. Shortly thereafter, the drive actually commences to write the data out. When it gets a chance. The reason IDE drives do this is that until very recently, the IDE interface allowed only one operation at a time to be "in flight" on an interface / drive. So, if the IDE drive really did write the data out, then report that it was done, it would be much slower than the SCSI drive listed above, because ALL operations on it would stop, waiting in line, for the caches to flush to the platters. For PostgreSQL, the way IDE drives operate is dangerous. Write data out, call fsync(), get an immediate return, mark the data as committed, move on the next operation, operator trips over power cord / power conditioner explodes, power supply dies, brown out causes the machine to reboot, et. al., and when the machine comes up, PostgreSQL politely informs you that your database is corrupt, and you come to the pgsql-general group asking how to get your database back online. Very bad. With SCSI drives, the same scenario results in a machine that comes right back up and keeps on trucking. So, what's happening to you is that on the machine with SCSI drives, PostgreSQL, the OS, and the drives are operating properly, making sure your data is secure, and, unfortunately, taking its sweet time doing it. Given that your .sql file is probably individual inserts without a transaction, this is normal. Try wrapping the inserts in the sql file in begin; / commit; statements, like so: begin; insert into table ... (100,000 inserts here) insert into table ... commit; and it should fly. And, if there's a single bad row, the whole import rolls back. Which means you don't have to figure out where the import stopped or which rows did or didn't take. You just fix the one or two bad rows, and run the whole import again. When a good friend of mine first started using PostgreSQL, he was a total MySQL bigot. He was importing a 10,000 row dataset, and made a smartassed remark after 10 minutes how it would have imported in minutes on MySQL. It was a test database, so I had him stop the import, delete all the imported rows, and wrap the whole import inside begin; and commit; The import took about 20 seconds or so. Now, for the interesting test. Run the import on both machines, with the begin; commit; pairs around it. Halfway through the import, pull the power cord, and see which one comes back up. Don't do this to servers with data you like, only test machines, obviously. For an even more interesting test, do this with MySQL, Oracle, DB2, etc... I've been amazed that the looks of horror I get for suggesting such a test are about the same from an Oracle DBA as they are from a MySQL DBA. :)