Chad, On 1/30/07 7:03 AM, "Chad Wagner" <chad.wagner@xxxxxxxxx> wrote: > On 1/30/07, Luke Lonergan <llonergan@xxxxxxxxxxxxx> wrote: >> Not that it helps Igor, but we've implemented single pass sort/unique, >> grouping and limit optimizations and it speeds things up to a single seqscan >> over the data, from 2-5 times faster than a typical external sort. > > Was that integrated back into PostgreSQL, or is that part of Greenplum's > offering? Not yet, we will submit to PostgreSQL along with other executor node enhancements like hybrid hash agg (fixes the memory overflow problem with hash agg) and some other great sort work. These are all "cooked" and in the Greenplum DBMS, and have proven themselves significant on customer workloads with tens of terabytes already. For now it seems that the "Group By" trick Brian suggested in this thread combined with lots of work_mem may speed things up for this case if HashAgg is chosen. Watch out for misestimates of stats though - hash agg may overallocate RAM in some cases. >> I can't think of a way that indexing would help this situation given the >> required visibility check of each tuple. > > I agree, using indexes as a "skinny" table is a whole other feature that would > be nice. Yah - I like Hannu's ideas to make visibility less of a problem. We're thinking about this too. - Luke