thanks for your answer.
the aggregate function I was talking about is the function I need to use for the non-group by columns like min() in my example.
There are of course several function to choose from, and I wanted to know which causes as less as possible resources.
best regards,
Uwe
On 18 April 2011 18:19, Robert Haas <robertmhaas@xxxxxxxxx> wrote:
On Wed, Mar 16, 2011 at 4:45 AM, Uwe Bartels <uwe.bartels@xxxxxxxxx> wrote:Hmm. It seems to me that there's no way to work out the distinct
> I'm having trouble with some sql statements which use an _expression_ with
> many columns and distinct in the column list of the select.
> select distinct col1,col2,.....col20,col21
> from table1 left join table2 on <join condition>,...
> where
> <other expressions>;
>
> The negative result is a big sort with teporary files.
> -> Sort (cost=5813649.93..5853067.63 rows=15767078 width=80)
> (actual time=79027.079..81556.059 rows=12076838 loops=1)
> Sort Method: external sort Disk: 1086096kB
> By the way - for this query I have a work_mem of 1 GB - so raising this
> further is not generally possible - also not for one special command, due to
> parallelism.
>
> How do I get around this?
values without either sorting or hashing the output, which will
necessarily be slow if you have a lot of data.
You might try SELECT DISTINCT ON (key columns) <key columns> <non-key
> I have one idea and like to know if there any other approaches or an even
> known better solution to that problem. By using group by I don't need the
> big sort for the distinct - I reduce it (theoreticly) to the key columns.
>
> select <list of key columns>,<non key column>
> from tables1left join table2 on <join condition>,...
> where
> <other conditions>
> group by <list of key columns>
columns> FROM ...
Not sure I follow this part.
> Another question would be what's the aggregate function which needs as less
> as possible resources (time).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company