On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@xxxxxxxxx> wrote: > > Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff > wasn't added til 8.4. > Dave > > On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@xxxxxxxxxxxx> wrote: > > Dave, > > > > Why not test the windowing version I posted? We finally have moved over to 8.4 and so I just wanted to post the time comparison numbers to show the times on 8.4 as well. This is also a newer data set with ~700k rows and ~4k distinct id_key values. 1) Dependent subquery SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key = b.id_key); 8.3.3: Killed it after a few minutes 8.4.13: Killed it after a few minutes 2) Join against temporary table SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key) AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp; 8.3.3: 1.4 s 8.4.13: 0.5 s 3) DISTINCT ON: SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER BY id_key, time_stamp DESC; Without Index: 8.3.3: 34.1 s 8.4.13: 98.7 s With Index (data(id_key, time_stamp DESC)): 8.3.3: 3.4 s 8.4.13: 1.3 s 4) Auto-populated table SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid FROM latestdata); 8.3.3: 0.2 s 8.4.13: 0.06 s 5) Windowing SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp, value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp DESC) AS ranking FROM data) AS a WHERE ranking=1; 8.3.3: N/A 8.4.13: 1.6 s So the auto-populated table (#4) is the fastest by an order of magnitude, but the join against the temporary table (#2) is the next best option based on speed and doesn't require the extra multi-column index that DISTINCT ON (#3) does. On a related note though, is there a way to make the multi-column index used in the DISTINCT ON more efficient. Based on the results, it appears that the multi-column index is actually a single index with the ordering of the tree based on the first value and then the second value. Is there a way to make it be a "multi-level index"? What I mean is that the first value is basically a tree/hash that then points to the second index because if that's possible then that would probably make the DISTINCT ON (#3) version as fast or faster than the auto-populated table (#4). Is there a way to create an index like that in postgres? Thanks, Dave -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance