On Mon, 25 Jul 2005 17:50:55 -0400, Kevin Murphy <murphy@xxxxxxxxxxxxxxx> wrote: > and because the number of possible search terms is so large, it >would be nice if the entire index could somehow be preloaded into memory >and encouraged to stay there. Postgres does not have such a feature and I wouldn't recommend to mess around inside Postgres. You could try to copy the relevant index file(s) to /dev/null to populate the OS cache ... >There are 1.3M unique values in that column. That'd mean that each value occours 10 times on average. In your tests the planner consistently estimates 81, and the real numbers are even higher. Can this be explained by the nature of the data distribution? >and from postgresql.conf: >shared_buffers = 15000 >work_mem = 32768 >Everything else in postgresql.conf is default. Setting effective_cache_size to a sane value wouldn't hurt. I don't know about OS X; does it, like Linux, automatically tune its disk cache or do you have to configure it somewhere? >tagged_genes table: >13,982,464 rows >422,028 pages (although about half of that is the experimental tsvector >column, though!) >The index of the query column (mention) is 226,229 pages (= 1.7 GB?). The average tuples per page ratio seems a bit low, both for the heap (~33) and for the index (~62). If the planner's tuple size estimation of 67 bytes is approximately right, there's a lot of free space in your relations. Try VACUUM FULL and REINDEX or CLUSTER to shrink these files. >create table tagged_genes ( > id bigint NOT NULL PRIMARY KEY, -- artificial primary key > mention text, -- a gene name or description > pmid bigint, -- identifies the document that >the mention occurs in > create_date timestamp NOT NULL, > vector tsvector -- experimental tsearch2 index of >mention column >); >create index tg_mention_idx on tagged_genes(mention); >create index tg_pmid_idx on tagged_genes(pmid); >create index tg_vector_idx on tagged_genes(vector); If mention is long (which is not implied by your examples, but an int is still smaller than any nonempty text) and there are many duplicates, it might pay off to put them in their own table: CREATE TABLE mentions ( id SERIAL PRIMARY KEY, mention text UNIQUE, vector tsvector -- experimental tsearch2 index ) WITHOUT oids; create index me_vector_idx on mentions(vector); and reference them from tagged_genes: create table tagged_genes ( id bigint NOT NULL PRIMARY KEY, mentionid int REFERENCES mentions, pmid bigint, -- identifies the document that -- the mention occurs in create_date timestamp NOT NULL ) WITHOUT oids; Unless my math is wrong, this would result in a heap size of ~120K pages and an index size of ~52K pages, plus some 10% slack for updated and deleted tuples, if you VACUUM regularly. Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster