Thank you very much Michael. Your inputs are very helpful for me. Just
have one small question, the example you gave is based on postgresql
8.1, does it apply to PostgreSQL 8.0.1 as well (i686-pc-linux-gnu,
compiled by GCC gcc 3.3.2)?
- Emi
. id is the primary key, so a default unique index is generated
automatically ?
Yes. When you issue the CREATE TABLE statement you should see a
notice like the following:
CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
You can see the index if you look at the table's description, such
as when doing "\d test" in psql.
. if queries based on name are often, index should be setup for name?
if there are 30,000 records, 29,000 records' names are different, will
the index for name still be useful?
If you have 29,000 unique names out of 30,000 rows then an index
should definitely speed up queries by name.
. possible values for sex are F/M and null, should we setup index for sex?
Probably not, although 8.1 can make better use of indexes on
low-cardinality columns than previous versions could. If you're
using 8.1 then try running typical queries with and without such
an index to see if it makes much difference. EXPLAIN ANALYZE will
show whether the index is being used. Unless you see a significant
improvement in query performance then don't bother indexing this
column.
. How about index for date and timestamp?
Probably, if you regularly query on those columns.
Basically, I'd like to know is there a percentage of the differences
among data to decide whether index will help or not? For example, among
30,000 records, for a column, its value choices are less than A% and
greater than B% so that we know index will help a lot?
There's no absolute rule; among other things physical order on disk
influences the planner's decision to use an index. If a table is
clustered on a particular index then the planner might use that
index to fetch 80% of the table's rows, but if the data is randomly
scattered then the planner might prefer a sequential scan to fetch
only 3% of the rows.
If you regularly query a column that has more than a handful of
distinct values then queries will probably benefit from an index
on that column; in 8.1 even queries against low-cardinality columns
might benefit from an index. If you perform a lot of queries for
values that are close together, or if queries for a certain value
tend to return a lot of rows, then consider clustering the table
on that column's index (and be sure to run ANALYZE afterwards to
update the table's statistics). Indexes have costs, however: they
take up disk space and they can slow down other operations like
inserts and updates because each index has to be updated as well.
Create whatever indexes you need to realize a significant improvement
in query performance, but don't overdo it.