Re: Query meltdown: caching results

Norman Peelman <npeelman@xxxxxxxxxx> · Tue, 26 Feb 2008 10:32:25 -0500

Gordon wrote:
I'm working on a CMS that, in addition to the database-stored version
of articles for easy searching, sorting, etc, also stores a HTML file
version so pages can be fetched with the minimum of overhead (browsing
articles has no more overhead than accessing any other HTML file on
the server).

As I've been trying to keep the system modular I've taken to using an
OO approach to the server side scripts, which are all written in PHP 5
and use PDO for database access.  I've also been using prepared
sequences almost exclusively for security and performance reasons.
I've tried to wrap sequences of queries in transactions as well, to
eliminate the "every query is its own transaction" overhead.

With previous projects which I wrote using structured programming
methods it was quite easy to hold caches of results and keep database
queries to a minimum, but I've found this extremely difficult to pull
off when using the OO approach, and now it's starting to have some
real performance consequences.  The biggest one comes when publishing
a document that has siblings.  CMS content is organized in a tree with
folders, subfolders and documents.  A document can be published, where
both a HTML and database copy exist, or unpublished, where only the
database version exists, thus denying visitors to the site access to
it.  Documents in a folder get a sidebar with links to the other
documents in the same folder, and when you change the published status
of a document then all the other documents that are also published in
that folder have to be republished in order to update their
sidebars.

This means fetching a list of all the documents with the same parent
and that have a published flag status of true, using the text stored
in the database to generate the HTML page and saving it to disk.
Documents have an associated template, which also has to be fetched
from the database.  And all documents have data such as their path,
which is a chain of the document's parents back to the root so that
things like breadcrumbs can be generated.

In the structured approach I'd have just cached stuff like the trail
back to the root as I know it'll be the same for all documents, so I'd
only have to run the sequences of queries to get the full trail once.
But as each instance of a document is independent of all the others
doing things like this is proving really difficult.

I need to find a way of not running queries that I don't need to,
either in the PHP script, or in the Postgres database.  What I need is
for a result set to be cached somewhere, either by Postgres or PHP, so
when it sees the same query again in a given session it just returns
the previously fetched result set.  The cache also needs to be able to
disregard its cached result sets when an event that changes a table
occurs (insert, update, delete, etc).

On the PHP side I've written a simple Database class that extends PDO
and that I use in its place.  It's a simple class that basically I use
to allow me to nest calls to beginTransaction(), commit () and
rollback () (It only starts an actual transaction of a counter is 0.
Otherwide it just increments the counter.  Commit only actually
commits when the counter is 1, and decrements it otherwise.  Rollback
sets an error flag and decrements the counter, and only rolls back
when the counter is 1.  If the error flag is set then commit will
actually roll back instead.  )

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

I seem to remember MySQL providing some kind of results caching, can
Postgres do the same?    Has anyone else run into similar problems and
how did they overcome them?

  Don't know about Postgres but yes MySQL does have caching. You could 
also take a look at APC (Alternative PHP Cache) depending on your setup. 
Very easy to use. And very easy to monitor what's actually going on with 
your pages in the cache. Once a page is generated you can store it in 
the cache and give it a time to live before going back to the db.

--
Norman
Registered Linux user #461062

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org/