Re: Two efficiency questions - clustering and ints

Richard Huxton <dev@xxxxxxxxxxxx> · Fri, 06 Oct 2006 09:56:32 +0100

John D. Burger wrote:
I have a good-size DB (some tables approaching 100M rows), with 
essentially static data.

Should I always cluster the tables?  That is, even if no column jumps 
out as being involved in most queries, should I pick a likely one and 
cluster on it?  (Of course, this assumes that doing so won't cause bad 
correlation with any other oft-used column.)

Well you cluster on an index, and if you don't think the index is 
useful, I'd drop it. If you have an index, clustering isn't necessarily 
going to help you unless you regularly read a series of rows in order.

Another question, about integer types - if no cross-type coercion is 
involved, is there any reason not to choose the smallest int type that 
will fit my data?  In particular, I have a column of small-integer 
ratings with, say, values in [1, 10].  If I'm only comparing within such 
ratings, and possibly computing floating point averages, etc., what are 
the good and bad points of using, say, SMALLINT?  What about NUMERIC(1) 
or (2)?

(int2, int2) should pack into 4 bytes on-disk, but due to alignment 
issues I think (int2, int4) still takes up 8 bytes. There has been 
discussion about being able to have different physical column ordering 
on-disk vs. in SQL but no decision as to whether the effort will be 
worthwhile.

Numeric types tend to be slower than their int equivalent, and though 
I've not checked their storage requirements, I'd assume they take more 
space too.

HTH
--
  Richard Huxton
  Archonet Ltd