On Tue, Sep 20, 2011 at 7:43 PM, Thomas Kappler <tkappler@xxxxxxxxxxxxxx> wrote: > [please CC, I'm not on the list] > > Hi all, > > we have one table that basically uses Postgres as a key-value store. > > Table "public.termindex" > Column | Type | Modifiers > -------------+---------+----------- > subject_id | integer | > indextype | integer | > cid | integer | > > This is with Postgres 9.0. > > The table has 96 million rows and an index on each column. It contains > no NULLs and has no triggers. > > subject_id has about 2m distinct values, cid about 200k, and indextype only six. > > The table is *read-only* after the initial load. > > The query we want to do is (with example values): > > select t.cid, count(distinct t1.subject_id) > from termindex t1, termindex t2 > where t1.cid=20642 and t1.indextype=636 and t2.indextype=636 and > t2.subject_id=t1.subject_id > group by t2.cid; Do you have any multi column indexes? From the text of your query it seems it could benefit from these two indexes: (cid, indextype) (subject_id, indextype) I do not know whether PostgreSQL can avoid the table if you make the first index (cid, indextype, subject_id) in other words: append all the columns needed for the join. In theory the query could then be satisfied from the indexes. > Pasting the explain analyze output into > http://explain.depesz.com/s/Yr4 we see that Postgres is doing an > external sort using about 150MB of data. > > Now, we're not Postgres experts, or even great at relational design. > Are there better ways of doing that query, or designing the table? For > the latter we do have a number of constraints, though, that I don't > want to outline now because this mail is already long enough. Those are probably important to know. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance