I am researching how to set up a schema for querying a set of tags associated with an object. There are approximately 100 distinct tags in my application (these will be pre-populated), and I am expecting a fairly low number of distinct sets of these tags -- in other words, a given set of tags will most likely be reused by many objects. There will be on the order of thousands to hundreds-of-thousands of records in the 'objects' table. I could use the standard many-to-many approach for this, that I've used many times in the past: CREATE TABLE TAGS ( ID SMALLINT PRIMARY KEY, NAME VARCHAR); CREATE TABLE OBJECTS ( ID SERIAL PRIMARY KEY, FOO VARCHAR); CREATE TABLE OBJECT_TAGS ( OBJECT_ID INT, TAG_ID SMALLINT, PRIMARY KEY (OBJECT_ID, TAG_ID)); However, I've run across a couple of articles suggesting that I could use an array field to store the tags on the object, and use a GIN index to do set comparisons, e.g: http://sjsnyder.com/using-postgresql-arrays-with-gin-indexes-for. Though note that in that article they are storing each tag in the 'tags' field as text, whereas I think it would make more sense to store the id of each tag in the 'tags' field, and cache the mapping from tag names to tag ids in my application: CREATE TABLE TAGS ( ID SMALLINT PRIMARY KEY, NAME VARCHAR); CREATE TABLE OBJECTS ( ID SERIAL PRIMARY KEY, TAGS SMALLINT[], FOO VARCHAR); CREATE INDEX OBJECTS_TAGS_INDEX ON OBJECTS USING GIN (TAGS); I like this array-plus-gin-index approach because the object id is not repeated over and over again as it is in the many-to-many link table. It seems like it would be much more efficient. Another advantage is that the queries are so much more legible -- it just looks a lot cleaner to me. However, the PostgreSQL manual's chapter on arrays (http://www.postgresql.org/docs/9.2/static/arrays.html) does not mention GIN indexes. Instead, it has a tip saying "Arrays are not sets; searching for specific array elements can be a sign of database misdesign." Of course the manual is correct if the array field is unindexed, but I find the fact that the GIN-indexing approach is not mentioned to be puzzling. So, my question is: Is it in fact a good idea to use the integer-array-plus-GIN-index approach as a way to store sets in-line? Does anyone actually do this in production? Or is the old-school many-to-many relationship still the way to go? Of course I'm going to try both approaches out myself and see how it works, but I'm interested in the communities opinions on this subject (particularly since I'm new to PostgreSQL, having mostly used commercial databases in the past). Thanks, Mike Jarmy -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general