Wes wrote:
Indexes are too fragile. Our documents will be offline, and re-indexing
would be impossible. Additionally, as I undertstand it, tsearch2 doesn't
scale to the numbers I need (hundreds of millions of documents).
Jeff's right about tsvector - sounds like it's what you're looking for.
If you're worried about reindexing costs, perhaps look at partioning the
table, or using partial indexes (so you could have multiple indexes for
each table, based on (id mod 100) or some such).
Obviously, partitioning over multiple machines is usually quite do-able
for this sort of task too.
Is anyone aware of any such solutions for PostgreSQL, open source or
otherwise?
Without wishing to discourage a potential large user from PG, it might
be worth checking if Google/Yahoo/etc have a non-relational server that
meets your needs off-the-shelf.
--
Richard Huxton
Archonet Ltd