-----Original Message----- From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] Sent: Wednesday, August 31, 2011 7:10 PM To: David Johnston Cc: pgsql-general@xxxxxxxxxxxxxx Subject: Re: ARRAY_AGG(DISTINCT a ORDER BY b) Error "David Johnston" <polobo@xxxxxxxxx> writes: > "in an aggregate with DISTINCT, ORDER BY expressions must appear in > argument list" > Why? Because the results are ill-defined otherwise. In your example, > ... ARRAY_AGG(DISTINCT accountnumber ORDER BY amount DESC) ... there may be many rows with the same "accountnumber" and yet different "amount" values. Which of those amounts should be used as the sort key for the aggregated row? regards, tom lane ---------------------------------- My take is that you look at the non-aggregated data using the same ORDER BY clause and then add the values in the order they appear in the detail. So (account, amount, [Action]): 1000, 150, [Add] 1000, 130, [Skip] 2000, 120, [Add] 2000, 100, [Skip] 1000, 50, [Skip] 2000, 0, [Skip] 3000, -10, [Add] 1000, -50, [Skip] This is basically how a plain DISTINCT would have to work (taking the first encountered value and skipping any subsequent repeats). I want the same general behavior but have the opportunity to order the input "frame" so that I can manipulate the encounter order. In a 2-phase process you would do the following (I will remain specific for the time being): Phase 1: Select the representative record for each DISTINCT (accountnumber); The representative would be the FIRST record as determined via an ORDER BY of all candidate records. Phase 2: Order the DISTINCT representatives based upon the same ORDER BY clause So: Phase 1: (3000, -10), (1000, 150), (2000, 120) [Not Ordered] Phase 2: (1000, 150), (2000, 120), (3000, -10) [ORDER BY amount DESC] The behavior/concept of [ARRAY_AGG(DISTINCT ON (accountnumber) .?. ORDER BY amount] matches here but I do not want to output an amount column at all. The other view is that you can create the ARRAY using just the ORDER BY and then immediately "DISTINCT"ify the array so that there are no duplicates. This is basically the 2-phase process described above. In this view you basically keep the DISTINCT value that has the lowest array index. I would be interested in other possible interpretations/algorithms that would then cause ambiguity in deciding which algorithm to implement. I know that I am only considering "ARRAY_AGG" in my examples but my first reaction is that other aggregates would behave acceptably under the algorithm described; and since the current behavior is FAIL at least some well-defined behavior would exist for the specified syntax. David J. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general