The parsing has turned out to be pretty intense. It takes about 10-20 minutes for any project. When we are parsing data, it really slows down the site's response. I tested serving static webpages from apache, endless loops in php , but the choke point seems to be doing any other query on postgres when constructing a php page during parsing.
Do you do lots of INSERTs without explicitly using transactions ? You also need to run EXPLAIN ANALYZE on your most frequent queries. It is very possible the slowdown is just from a forgotten index.
As an example, the original designer specified separate tables for each project. Since they were all the same structure, I suggested combining them into a single table with a project_id column, but he said it would take too long to query. I was suspicious, but I went with his design anyway.
From the small size of the dataset I don't see a justification for this...
It turned out he was right for our current set up. When I needed to empty the project table to re-parse data, doing a cascading delete could take up to 10 minutes! I cut re-parsing time in half by just dropping the table and creating a new one. Which was an okay thing to do, since the table only belonged to one project anyway. But I hate to think how long it would have taken to do a delete, cascading to child tables, if all the project data was in a single table.
That's probably because you forgot to create an index on the referenced column. They are not created automatically.
Since I'm not an expert in Postgres database design, I'm assuming I've done something sub-optimal. Are there some common techniques for tuning postgres performance? Do we need beefier hardware? Or is it a problem with how PHP or apache pools connections to the database?
It depends on a lot of stuff, but the first thing is to run EXPLAIN ANALYZE on your queries and post the results here.