I was inspired by this article to store xml documents I want to query in a single column: http://robots.thoughtbot.com/post/13829210385/the-durable-document-store-you-didnt-know-you-had-but But I'm finding it difficult to get the query plan to recognize or use the indexes efficiently; I'm using the indexes he suggests for each xpath value. In my case they look like this: CREATE INDEX ind_entity_name ON forms USING GIN ( CAST(xpath('//primaryEntity/entityName/text()', data) as TEXT[]) ); CREATE INDEX ind_prior_entity_name ON forms USING GIN ( CAST(xpath('//primaryEntity/entityPreviousNameList/value/text()', data) as TEXT[]) ); The first xpath -- '//primaryEntity/entityName/text()' -- has only one possible value, so when I query it like this, the explain seems to show that the index is being used (the forms' table has 7,085 rows): mydb=> explain select id from forms where (xpath('//primaryEntity/entityName/text()', data))[1]::text ~* '^banc'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on forms (cost=0.00..770.27 rows=11 width=16) Filter: (((xpath('//primaryEntity/entityName/text()'::text, data, '{}'::text[]))[1])::text ~* '^banc'::text) (2 rows) But the second xpath -- '//primaryEntity/entityPreviousNameList/value/text()' -- points to multiple possible values within the xml document, so to make sure that I search them all, I have to write the query like this; in this case, it seems the index is *not* being used and it's doing a full table scan: mydb=> explain select entity.id from (select id, xpath(''//primaryEntity/entityPreviousNameList/value/text()', data), generate_subscripts(xpath(''//primaryEntity/entityPreviousNameList/value/text()', data), 1) as s1 from forms) as entity where entity.xpath[s1]::text ~* '^banc'; QUERY PLAN ------------------------------------------------------------------ Subquery Scan on entity (cost=0.00..876.55 rows=11 width=16) Filter: ((entity.xpath[entity.s1])::text ~* '^banc'::text) -> Seq Scan on forms (cost=0.00..787.99 rows=7085 width=48) (3 rows) So is there a way to create an index which will help the performance of this type of query? Or is there a way of writing the second query differently that will used the existing index better? Or should I just normalize the data from the documents I want to query into other tables, and use simpler joins? -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general