Re: Query meltdown: caching results

Gordon <gordon.mcvey@xxxxxxxxxxxx> · Wed, 27 Feb 2008 02:37:06 -0800 (PST)

On Feb 26, 5:26 pm, st...@xxxxxxxxxxxxxxxx (Gregory Stark) wrote:
> "Norman Peelman" <npeel...@xxxxxxxxxx> writes:
> >> My options are, as far as I can tell,
>
> >> 1) replace the Database PDO extending class with something else that
> >> provides query results caching in PHP, or
> >> 2) get Postgres itself to cache the results of queries to avoid
> >> running them repeatedly during a session.
>
> You might consider looking at memcached. One way to use it would be to have
> the PHP application check for the cached object first and use it rather than
> do any database queries. Then you can use pgmemcached to allow triggers to
> invalidate cached objects whenever the underlying data changes. (Or you could
> even just use pl/php to update or invalidate the cached object through the
> same code library)
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>   Ask me about EnterpriseDB's Slony Replication support!
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Thanks for the replies, but the problem is really centered around how
my script interacts with the database.  I know what the problem is,
and if I was using a structured approach I could figure out
workarounds fairly quickly, but I chose to do this project in OOP for
a few reasons (the old CMS was hacked together in a hurry by an
employee who is now long gone, is horribly written and completely
unmaintainable, the size of the project seemed to warrant an OOP
approach, we recently upgraded to PHP 5 and I wanted to take advantage
of the new features, PDO and prepared statements appeared from the
documentation to offer major security and performance enhancements,
and I wanted to increase my experience with OOP development as most
work I've done before now has been structured) and the high level of
compartmentalization that OOP demands is coming back to haunt me
now.

The problem comes when a document is published or unpublished.  All
the documents that relate to that document need to be republished to
add or remove the link to the document in question.  When the document
is published or unpublished the script gets related documents (at the
moment other documents that share the same folder, but this may change
in the future to cover more related content) and republishes them.

Each document has associated data, such as the ID of the template to
apply, its parent, its path back to root, etc, that are generated by
querying the database in various ways.  For example, the route back to
path is fetched by iterativly getting the parent folder, getting that
folder's parent, etc until the script hits the route.  Templates are
fetched by looking ath the template ID associated with the document.
If this is 0, then the script goes to the parent folder and gets the
template associated with the folder.  If that is 0 as well then it
iterativly goes up until it finds a template to apply or until it hits
the root, in which case it applies a default template.  The code
fragment from the script that does this looks like this:

$db	-> beginTransaction ();
if ($doc		= CmsItem::factory ('CmsDoc', intval ($_GET ['itm_id']),
$db, $user))
{
	if ((($doc	-> itemProps ['itm_publish']) && ($doc	-> unpublish ()))
	|| ($doc	-> publish ()))
	{
		// Republish related documents
		foreach ($doc -> getSiblingObjects () as $thisDoc)
		{
			if ((get_class ($thisDoc)	== 'CmsDoc')
			&& ($thisDoc	-> itemProps ['itm_publish']))
			{
				$thisDoc	-> republish ();
			}
		}
		// Item status changed
		$db	-> commit ();
		$_SESSION ['messageStack'][]	= ($doc	-> itemProps ['itm_publish']?
			'Item published':
			'Item unpublished');
	}
	else
	{
		// Couldn't change published status
		$db	-> rollback ();
		$_SESSION ['messageStack'][]	= ($doc	-> itemProps ['itm_publish']?
			'Unable to unpublish item':
			'Unable to publish item');
	}
}

GetSiblingObjects () runs a query that gets a list of IDs that share
the same parent as the current document.  It then iterates the list
and spawns a new CMS item for each item in the list and returns them
as an array.  As folders could be returned as well as documents we
only run republish () on those items.

CmsDoc -> publish () and CmsDoc -> unpublish () toggle a boolean
column in the database between true and false for the item being (un)
published.  unpublish () also deletes the concrete file associated
with the DB entry.

publish () and republish () write out a concrete HTML file based on
the content stored in the table for the document in question and the
template that should be applied.  The template is determined from a
template ID column. If it's 0 then the script walks up the tree until
it finds a template to use as described above.

publish () and republish () rely on a method of CmsDoc called generate
(), which creates a Smarty instance, does the work described above and
generates a HTML string for publish () and republish () to write out
to disk.

	public function generate ()
	// Generate document HTML
	{
		$paths		= $this -> getTemplatePaths ();
		$page		= new Template ($paths ['tplPath'], $paths ['cplPath']);
		$page		-> assign_by_ref	('document',	$this -> itemProps);
		$page		-> assign			('siblings',	$this		-> getSiblings ());
		$page		-> assign			('path',		$this		-> getTrail ());
		return ($page -> fetch ($paths ['tplFile']));
	}

Template is a class that inherits from Smarty, that just does a little
setup housekeeping for me and allows me to specify both the template
and the compile dirs from arguments instead of just the template dir.
getTemplatePaths returns where templates are stored and where they'll
be compiled.  getSiblings () is like getSiblingObjects () but doesn't
create objects, just returns a list of siblings.  getTrail returns an
array of folders fro mthe root to the document's containing folder.

The problem is that there is a vast number of redundant queries being
run.  The documents all share the same parent (the list was obtained
in the first place with getSiblingObjects () but for every one that
gets republished the getSiblings () and getTrail () queries get run
(As every document in a folder can have a different template the
getTemplatePaths () query would have to be run for every one anyway).
All the solutions I can think of for this problem would involve
breaking the principle that objects should not depend on a knowledge
of the inner workings of unrelated objects.  the only thing these
objects are guaranteed to all have in common is that they all share
the same Database object between them (Database is my PDO extending
class). This suggests to me that the best solution is to cache the
results of read queries either in the database engine itself, or in
the Database object.  Results caching would eliminate the problem of
the same queries beign run over and over because only the first
invocation of a query would actually query the database.  All
subsequent queries would hit the cache instead.

The problem is the mechanics of actually implementing this caching.
I'm using prepared statements almost exclusivly throughout the design,
meaning that the PDOStatement class probably needs to be extended
somehow and my Database prepare() method should return my extended
class instead of a generic PDOStatement.  The Database class does
caching of PDOStatements to save the overhead of each instance of a
class attempting to prepare the same statement over and over again.
If I can extend PDOStatement to cache its results the nthe repeated
query overhead will be more or less eliminated.  The problem is what
happens when a database modifying event occurs?  The easiest thing to
do would be to invalidate the cache in all existing PDOStatements,
which while it's not the most efficiant solution would be the safest.
but when one PDOStatement executes a modifying query then how will all
the other PDOStatements know?

This is why I was interested in caching at the database level, as it
would sidestep this problem.

I can't have been the first person to run up against this problem,
somebody somewhere must have implemented a framework for PDO that can
handle caching of results from prepared statements. I've just not been
able to find one.  Please, if anyone out there knows something I don't
then please let me know.

PS: For those of you who replied via private mail instead of in the
group, thank you for the responses, I appreciate the help and advice.
but could you please reply in the group?  Those messages get sent to
my home address, and I'm posting this from work, so I won't see them
until I get home.  Besides, it'll be helpful for the next poor sod who
hits this problem if there's a publicly viewable thread that'll turn
up in Google :)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/