Hi guys! This'll hopefully be an easy question for one of the perf guys
like Tom..
I'm wondering what the performance advantage is to using a clustered
index. Take the following example: I have a table of recipes with a
recipeid, and a table of comments that various users might have entered
about those recipes ("this tastes like crap!", etc) with a commentid and
a FK on recipeid. Naturally I want to be able to say "give me all the
comments about recipe x". To me, this seems like the perfect
opportunity to use a cluster. Inserts are fairly rare and reads need to
be super fast. If all the comments were clustered by recipeid, all the
data would be right next to each other on disk. It seems like the query
planner could count on this and make a lot of cool optimizations. It
could just find the first row with recipe x and traverse the rows until
it got to the next recipe.
However, the thing that really confuses me is I can insert as many rows
as I want, and they're not clustered. In fact, the only time they'll
cluster is if I run the "CLUSTER RecipeComments" command. Sure, I could
write a trigger to do this on insert, or I could write a script that
runs every midnight to do this, or what not. However, if the query
planner can't assume the data is in a certain order on disk, what's the
point of having this at all?
I'm quite sure SQL Server doesn't work this way and I'm not sure about
Oracle. Can someone enlighten me on the exact benefit of this? Thanks!!
Mike
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general