Ron Mayer wrote: >> ...The inner sets are on average 3.000 for >> both id1 and id2 and a typical limit would be 100, so if I could convince >> postgresql to not fetch all of them then I would reduce the set retrieved >> by around 60. The dataset is quite large so the random query is not very >> likely to be hitting the same part of the dataset again, so there is going >> to be a fair amount of going to disk., > > If disk seeks are killing you a kinda crazy idea would be to > duplicate the table - clustering one by (id1) and > the other one by an index on (id2) and unioning the > results of each. That's doubling the disk space needs for the table. Is there any odds that this would benefit when the intitial table significantly exceeds available memory by itself? > Since each of these duplicates of the table will be clustered > by the column you're querying it on, it should just take one > seek in each table. > > Then your query could be something like > > select * from ( > select * from t1 where id1=2067 order by evalue limit 100 > union > select * from t2 where id2=2067 order by evalue limit 100 > ) as foo order by evalue limit 100; This is actually what I ended up with as the best performing query, just still on a single table, because without duplication I can add index and optimize this one by (id1,evalue) and (id2,evalue). It is still getting killed quite a lot by disk IO. So I guess I'm up to: 1) By better disk (I need to get an estimate how large it actually is going to get). 2) Stick with one table, but make sure to have enough activity to get a large part of the index in the OS-cache anyway. (and add more memory if nessesary). The data is seeing a fair amount of growth (doubles in a couple of years ) so it is fairly hard to maintain clustering on them .. I would suspect. Is it possible to get PG to tell me, how many rows that fits in a disk-page. All columns are sitting in "plain" storage according to \d+ on the table. > Hmm.. and I wonder if putting evalue into the criteria to cluster > the tables too (i.e. cluster on id1,evalue) if you could make it > so the limit finds the right 100 evalues first for each table.... I didnt cluster it, since clustering "locks everything". -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance