AlexK987 <alex.cue.987@xxxxxxxxx> writes: > This is a realistic case: everyone have Python and Java skills, but PostGis > and Haskell and Closure are rare. If we are looking for a person that has > all the skills required for a task (array[1, 15]), that is "skills <@ > array[1, 15] " and not the opposite, right? One of us has this backwards. It might be me, but I don't think so. Consider a person who has the two desired skills plus skill #42: regression=# select array[1,15,42] <@ array[1,15]; ?column? ---------- f (1 row) regression=# select array[1,15,42] @> array[1,15]; ?column? ---------- t (1 row) > Also can you explain why " entries for "0" and "1" swamp everything else so > that the planner > doesn't know that eg "15" is really rare. " I thought that if a value is not > found in the histogram, than clearly that value is rare, correct? What am I > missing here? The problem is *how* rare. The planner will take the lowest frequency seen among the most common elements as an upper bound for the frequency of unlisted elements --- but if all you have in the stats array is 0 and 1, and they both have frequency 1.0, that doesn't tell you anything. And that's what I see for this example: regression=# select most_common_elems,most_common_elem_freqs from pg_stats where tablename = 'talent' and attname = 'skills'; most_common_elems | most_common_elem_freqs -------------------+------------------------ {0,1} | {1,1,1,1,0} (1 row) With a less skewed distribution, that rule of thumb would work better :-( regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance