AlexK987 <alex.cue.987@xxxxxxxxx> writes: > I've created a GIN index on an INT[] column, but it slows down the selects. > Here is my table: > create table talent(person_id INT NOT NULL, > skills INT[] NOT NULL); > insert into talent(person_id, skills) > select generate_series, array[0, 1] || generate_series > from generate_series(3, 1048575); > create index talent_skills on talent using gin(skills); > analyze talent; > Here is my select: > explain analyze > select * from talent > where skills <@ array[1, 15] Well, that's pretty much going to suck given that data distribution. Since "1" is a member of every last entry, the GIN scan will end up examining every entry, and then rejecting all of them as not being true subsets of [1,15]. I'm not sure whether it'd be practical to teach GIN about negative proofs, ie noticing that rows containing "0" could be eliminated based on the index contents. But in any case it does not know that today. Another problem, at least with the default statistics target, is that the entries for "0" and "1" swamp everything else so that the planner doesn't know that eg "15" is really rare. You'd be much better off if you could refactor the data representation so that whatever you mean by "0" and "1" is stored separately from whatever you mean by the other entries, ie, don't keep both extremely common and extremely rare entries in the same array. Also ... perhaps I'm reading too much into the name you chose for the column, but I'm finding it hard to imagine why you'd care about the performance of this query as opposed to "where skills @> array[1, 15]". That is, wouldn't you typically be searching for people with at least certain specified skills, rather than at most certain specified skills? Another thing that maybe is a question for -hackers is why we consider arraycontained to be an indexable operator at all. As this example demonstrates, the current GIN infrastructure isn't really capable of coping efficiently, at least not in the general case. It might be all right in specific cases, but the planner doesn't have the smarts to tell which is which. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance