What are the benefits of using a clustered index?

Mike Christensen <imaudi@xxxxxxxxxxx> · Mon, 16 Mar 2009 23:08:36 -0700

Hi guys!  This'll hopefully be an easy question for one of the perf guys 
like Tom..

I'm wondering what the performance advantage is to using a clustered 
index.  Take the following example: I have a table of recipes with a 
recipeid, and a table of comments that various users might have entered 
about those recipes ("this tastes like crap!", etc) with a commentid and 
a FK on recipeid.  Naturally I want to be able to say "give me all the 
comments about recipe x".  To me, this seems like the perfect 
opportunity to use a cluster.  Inserts are fairly rare and reads need to 
be super fast.  If all the comments were clustered by recipeid, all the 
data would be right next to each other on disk.  It seems like the query 
planner could count on this and make a lot of cool optimizations.  It 
could just find the first row with recipe x and traverse the rows until 
it got to the next recipe.

However, the thing that really confuses me is I can insert as many rows 
as I want, and they're not clustered.  In fact, the only time they'll 
cluster is if I run the "CLUSTER RecipeComments" command.  Sure, I could 
write a trigger to do this on insert, or I could write a script that 
runs every midnight to do this, or what not.  However, if the query 
planner can't assume the data is in a certain order on disk, what's the 
point of having this at all?

I'm quite sure SQL Server doesn't work this way and I'm not sure about 
Oracle.  Can someone enlighten me on the exact benefit of this?  Thanks!!

Mike

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general