Hello all, I've been struggling with some performance questions regarding our Postgres databases. Here's the background: We run 4 ~25-30Gb databases which cache information from eBay. These databases have had performance issues since before I joined the company. The databases have gone through a number of iterations. Initially, they were deployed as one huge database - performance was apparently unacceptable. They were split, and tried on a variety of hardware platforms. When I joined the company last year, the databases were deployed on 12-disk RAID5 arrays on dual-proc AMD machines with 4Gb of RAM, running Debian Woody and Postgres 7.2. These systems seemed to suffer a gradually decreasing performance accompanied by a gradually growing disk space usage. The DBA had come to the conclusion that the VACUUM command did/does not work on these systems, because even after a VACUUM FULL, the size of the database was continually increasing. So, as things stand with the PG7.2 machines, vacuuming is run nightly, and whenever the database size reaches 40Gb on disk (the point at which performance has degraded below tolerance), the DBA exports the data, deletes the database, and then imports the data, shrinking it to the actual size of the dataset. This process is time-consuming, costly, and the servers that we are deployed on do not meet our stability requirements. So, after much pushing, I was able to deploy a 12-disk RAID5 dual-proc AMD64 machine with 16Gb of RAM running FreeBSD and Postgres 8.1. The performance increase was immediate, obvious, and dramatic, as you might expect from such a large boost in the underlying hardware. This upgrade appears to have solved the VACUUM issue - regular VACUUM commands now seem able to maintain the database size at a steady-state (taking into account fluctuations associated with actual changes in the dataset size!). We are now planning on moving the other three databases to the new platform and hardware. However, we still are suffering a gradual decrease in performance over time - or so the application engineers claim. The DBA and I have been banging our heads against this for a month. Which brings me to the questions: 1) How does one define 'performance' anyway? Is it average time to complete a query? If so, what kind of query? Is it some other metric? 2) I've combed the archives and seen evidence that people out there are running much much larger databases on comparable hardware with decent performance. Is this true, or is my dataset at about the limit of my hardware? 3) Though this may seem irrelevant, since we are moving away from the platform, it would still be good to know - was VACUUM actually completely useless on PG7.2 or is there some other culprit on these legacy machines? 4) Much of my reading of the PG docs and list archives seems to suggest that much of performance tuning is done at the query level - you have to know how to ask for information in an efficient way. To that end, I took a look at some of the queries we get on a typical day. On average, 24 times per minute, our application causes a unique key violation. This struck me as strange, but the VP of Engineering says this is a performance ENHANCEMENT - the code doesn't bother checking for the unique key because it depends on the database to enforce that. My interpretation of reading the archives & docs seems to indicate that this could be causing the constantly increasing database size... so now that I've rambled about it, does an INSERT transaction that is rolled back due to a unique key violation leave dead rows in the table? If so, why? I really appreciate any information you guys can give me. I'm convinced that PG is the best database for our needs, but I need to be able to get this database performing well enough to convince the bigwigs. Regards, Paul Lathrop Systems Administrator
Attachment:
signature.asc
Description: OpenPGP digital signature