On Mon, 2012-03-12 at 15:57 -0700, Jim Ostler wrote: > I have a table that is around 20 GB, so I need to optimize as best as > possible the matching with another table on keywords across multiple > fields. I have around 10 fields that have keywords or phrases > delimited with the "or" operator "|". So it would be in the form of > "a | b | and jack | cd" . There are around 20 keywords or phrases > per field, and these keywords could be any word. > How static is the list of keywords? Is there any significance in their order? (I'm guessing there isn't). How many rows are there in the table, IOW how big is each row? I wonder if putting the keywords in a separate table with a many:many relationship with your big table would help. This would retain your ability to add or remove keywords without affecting the schema. The selection rules used to combine keywords would also run fast. I'm getting good performance from a mail archive where messages are selected using: - a 1:M relationship with a subject table - a M:M relationship with an address table - range comparison with the date sent (non-indexed field in the message) - and ILIKE(%phrase%) comparison with the message text The query is assembled using WHERE clauses for each of the four selection possibilities listed above before being prepared and issued. A where clause is only included if the user has specified a value for it. There is an AND relationship between the clauses. Currently there are around 130,000 messages in the database. Speed: it took 0.6 seconds to find 209 messages to or from an address in the last 5 years and containing the keyword 'f1a'. The same search without the date range specified ran in 0.86 seconds. It took 16.9 seconds to find the 12331 messages containing 'f1a' when no address or data range were specified. This is on a dual core, 3.2 GHz Athlon with 4GB of RAM using PG 9.1. The other possibility is to use a star schema with each keyword being a dimension - IOW a traditional data warehouse set-up, but of course this does require a static list of keywords to be defined. Martin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general