> On Wed, Jan 14, 2009 at 07:36:03PM -0800, mailinglists@xxxxxxxxxxxxxxx > wrote: >> CREATE TABLE listings ( >> trans_id SERIAL, >> mode CHAR(1), >> listing_id INT, >> region_id INT, >> category INT >> ); >> >> "SELECT * FROM listings ORDER BY region_id, category, listing_id, >> trans_id" > >> [...] what I want to do is get only the last transaction for >> a given listing_id, because the earlier ones don't matter. > > If you have an index on (region_id,category,listing_id,trans_id) you > should be able to do: > > SELECT region_id,category,listing_id,MAX(trans_id) > FROM listings > GROUP BY region_id,category,listing_id; > > And have PG answer this using the index (it'll only do this if it thinks > there are many transactions for each group though). > >> On top of >> that, each region_id and category_id has its own index. I need to be >> able >> to process the indexes in-full, one-at-a-time because there are too many >> to hold that many open filehandles/processes at one time. > > Not sure what you mean by "index" here; I'm assuming you're talking > about something outside PG, or am I missing some context? Yes, sorry I wasn't clear... This is to keep track of incremental updates to an index outside of Postgres >> So, my question is, is there some way to return the rows in a >> deterministic order, without actually having to do an explicit sort on >> the >> data? What I mean is, I don't care if category_id 4 / region_id 10 / >> listing_id 10000 comes before category_id 1 / region_id 1 / lisitng_id 1 >> -- I just need them returned to me in that sort of grouped order >> (although >> sorted by trans_id). > > If you want to know all the transaction ids then you need to do the > sort, if you only want the largest/latest then you're probably better > off telling PG that's what you want (i.e. using GROUP BY and MAX > aggregate and letting it make an appropiate decision). > I only need to know the max id, but the problem is that for each region_id has N categories and I need to process each one individually. The query you suggested does, indeed, sort them by category, but it does not sort them by region_id. Here's an example: (select region_id,category,listing_id,max(trans_id) from listings_pending where region_id IN('3134000000', '2222000000') group by region_id,category,listing_id;) region_id | category_id | listing_id | max ------------+----------------+------------+-------- 2222000000 | 1 | 2221473 | 640799 2222000000 | 1 | 2426142 | 845468 2222000000 | 1 | 2103599 | 522925 3134000000 | 1 | 2146326 | 565652 2222000000 | 1 | 2462112 | 881438 2222000000 | 1 | 1947690 | 367016 2222000000 | 1 | 2526731 | 946057 2222000000 | 1 | 2217864 | 637190 2222000000 | 1 | 2288420 | 707746 As you can see, at transaction 565652, I would close the index (which is very expensive, because it actually has to do a "merge" of the newly created index with old one), then reopen it at transaction 881438. Thanks for your help! - Greg -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general