Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)

Thorsten Glaser <tg@xxxxxxxxxxx> · Tue, 28 Feb 2023 02:22:47 +0100 (CET)

On Mon, 27 Feb 2023, Tom Lane wrote:

>Well, yeah.  Simplify it to
>
>  SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable;

That’s… a bit too simple for this case.

>If there are several rows containing the same value of x and different
>values of y, which y value are we supposed to sort the unique-ified x
>value by?  It's an ill-defined query.

The problem here is that I do not have an ‘x’.

>For the specific example you give, it's true that any specific
>possible output of jsonb_build_object() would correspond to
>a unique set of cot.weekday, cot.from_hour, cot.to_hour values.

Not necessarily (see directly below), but why would that matter?
It should sort the generated JSON objects within the array.

>, or maybe thinking harder
>about why you're getting duplicates in the first place.

The application developer informed me that it’s entirely possible
that some user entered matching information twice. I don’t have
the exact query that produced duplicates easily in the history
(the downside of working with \i) and tests on other users didn’t
produce duplicates.

So, yes, filtering them out is indeed part of the task here.

On Mon, 27 Feb 2023, David G. Johnston wrote:

>So long as the function call itself is at least stable...:
>
>DISTINCT func_call(...) ORDER BY func_call(...)

aieee really?

(I’d hope jsonb_build_object to be.)

Is that better or worse than using the extra functions to sort…?

>Order By is evaluated AFTER and over the contents of the distinct-ified
>expression

That’s right and good, but the problem is that I do not seem to
have a syntax with which to refer to the distinct-ified expression
to use in the ORDER BY clause.

>> place; I am somewhat proud I even managed to write this with
>> JOINs and without any subqueries in the first place so I’m
>> hesitant to go that route.
>
>That pride seems misplaced.  Related to Tom's comment, the presence of the
>DISTINCTs is telling you that what you did is not good.  DISTINCT is almost
>always a code smell

Not about the DISTINCTs. I haven’t used JOIN much (usually WHERE
as inner join) nor on tables this massive, and this is my second
foray into aggregate functions only.

>Lastly, if you do need to care about normalizing the output of JSON you
>should consider writing a function that takes arbitrary json input and
>reformats it, rather than trying to build up json from scratch where every
>individual component needs to be aware and take action.  i.e., get rid of
>the ORDER BY also.  Maybe this belongs in an application layer with tooling
>that already provides this capability.

Funnily enough, both here and in the other place where I tried to
use JSON output, PostgreSQL (with COPY TO STDOUT) *is* the application
layer. Here I’m generating a CSV file; in the other situation I was
going to put the generated JSON directly into an HTTP result filehandle.

In the latter, I dropped that approach, output CSV and converted that
(by replacing newlines with “],[” and prepending “[[” and appending
“]]”) to JSON myself, which worked there as it was all-numeric. But
the frustration there was about unnecessary whitespace instead.

Both cases have in common that a, possibly huge, result set can be
directly streamed from PostgreSQL to the consumer, but the former
lacks just that tiny bit of functionality that would make it really
rock :/

I was asking here because perhaps either that missing functionality
can be considered, or to find out if there’s better ways to produce
that output, due to my inexperience with SQL. The ways I’m using do
work, and I’m relatively happy, but…

bye,
//mirabilos
-- 
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)

Re: DISTINCT and ORDER BY in aggregate functions on expressions(!)