On Fri, Jul 10, 2009 at 1:41 PM, Ben Harper<rogojin@xxxxxxxxx> wrote: > > Unfortunately I can't use GROUP BY, because what I'm really doing is > SELECT DISTINCT ON(unique_field) id FROM table; You could do that using GROUP BY if you define a first() aggregate. In this case that would just be SELECT first(id) AS id from (select * from table ORDER BY unique_field, ...) GROUP BY unique_field. In cases with more fields it gets tiresome fast. In this case 8.4 won't actually help you. It only uses hash aggregates for DISTINCT not DISTINCT ON. > I'm not familiar with the Postgres internals, but in my own DB system > that I have written, I do the skip-scanning thing, and for my system > it was a really trivial optimization to code. Well things get tricky quickly when you have to deal with concurrent inserts and potentially page splits from other transactions. Also consider how hard it is to prove that the query falls into this category of queries. > Inside a GIS application, the user wants to categorize the display of > some information based on, in this case, the suburb name. > He clicks a button that says "Add All Unique Categories". This is a > very common operation in this domain. That doesn't look like what this query is doing to me. It's taking one exemplar from each suburb based on some other constraint (the minimum of whatever your order by key is) and taking the id of that data point. If the order by key doesn't specify a specific data point then it's a non-deterministic record. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general