I am unclear about the rules which postgresql uses in building a result
set when the select list ( see note ) contains several different
Set-Returning functions, possibly with sets of different sizes.
At a very high level, what I see happening is that each Set-Returning
expression causes the result set to be expanded with additional rows,
one per member of the set.
When there is only one such expression, this is clear - the size
of the final result set is N times the size of the output of the final
operator where N is the size of the set (i.e. N times the size if that
expression had been a scalar).
But it becomes unclear where there is more than one such expression.
I can't find much on this point in the doc (I am looking in PostgreSQL
13.2 Documentation). There is one small example in
"9.25. Set Returning Functions"
but in this example there is only one Set-Returning expression in the
select list.
Also the description of the SELECT command mentions something called a
"row group" :
"The actual output rows are computed using the SELECT output expressions
for each selected row or row group. (See SELECT List below.)"
but no mention of Set-Returning expressions in the referenced section.
I tried an experiment :
<code>
echo -e "DROP SCHEMA testing cascade;
CREATE SCHEMA testing;
CREATE TABLE testing.set_returning_in_select_list (id int4 , bitmask_a
int , bitmask_b int );
insert into testing.set_returning_in_select_list values ( 1 ,
x'1010'::int , x'0101'::int ) , ( 2 , x'2020'::int , x'0202'::int ) , (
3 , x'3030'::int , x'0303'::int );
select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a &
16 ) , ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a ,
to_hex(bitmask_b) as hex_b from
testing.set_returning_in_select_list order by bits_of_a;
select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a &
16 ) , ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a ,
to_hex(bitmask_b) as hex_b , unnest( array [ ( bitmask_b
& 1 ) , ( bitmask_b & 2 ) , ( bitmask_b & 3 ) ] ) as bits_of_b
from testing.set_returning_in_select_list order by bits_of_a;
select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a &
16 ) , ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a ,
to_hex(bitmask_b) as hex_b , unnest( array [ ( bitmask_b
& 3 ) , ( bitmask_b & 2 ) , ( bitmask_b & 1 ) ] ) as bits_of_b ,
unnest( array [ ( bitmask_b & 3 ) , ( bitmask_b & 2 ) ,
( bitmask_b & 1 ) , ( bitmask_b & 17 ) ] ) as extraneous_bits_of_b
from testing.set_returning_in_select_list order by bits_of_a;" | psql -e
</code>
The output of the last query is, firstly, different on postgresql-9.6
than on 13.3, and secondly, hard to explain in either case.
my outputs appended below.
I *think* :
. postgresql-9.6 applies a kind of triple cartesian join on the
elements of each set, joining one set with the other set and then again
with itself, ending up with many duplicates
. postgresql-13.3 applies a kind of null-padding extension of the
elements of the smaller set and then an arbitrary linear pairing of the
two sets, ending up with one duplicate.
Which one, if any, is correct? And why?
(note : I am not asking about the case of Set-Returning functions in the
FROM clause, where things seem clear(er).)
Cheers, John Lumby
My outputs of the final query :
. postgresql-9.6
<code>
id | hex_a | bits_of_a | hex_b | bits_of_b | extraneous_bits_of_b
----+-------+-----------+-------+-----------+----------------------
1 | 1010 | 0 | 101 | 0 | 0
1 | 1010 | 0 | 101 | 0 | 1
1 | 1010 | 0 | 101 | 0 | 1
1 | 1010 | 0 | 101 | 0 | 1
2 | 2020 | 0 | 202 | 2 | 2
2 | 2020 | 0 | 202 | 2 | 2
2 | 2020 | 0 | 202 | 2 | 0
2 | 2020 | 0 | 202 | 2 | 0
1 | 1010 | 16 | 101 | 1 | 1
1 | 1010 | 16 | 101 | 1 | 1
1 | 1010 | 16 | 101 | 1 | 1
1 | 1010 | 16 | 101 | 1 | 0
1 | 1010 | 16 | 101 | 1 | 1
1 | 1010 | 16 | 101 | 1 | 1
1 | 1010 | 16 | 101 | 1 | 0
1 | 1010 | 16 | 101 | 1 | 1
3 | 3030 | 16 | 303 | 3 | 3
3 | 3030 | 16 | 303 | 3 | 1
3 | 3030 | 16 | 303 | 3 | 1
3 | 3030 | 16 | 303 | 3 | 2
2 | 2020 | 32 | 202 | 2 | 0
2 | 2020 | 32 | 202 | 2 | 2
2 | 2020 | 32 | 202 | 2 | 0
2 | 2020 | 32 | 202 | 0 | 0
2 | 2020 | 32 | 202 | 2 | 2
2 | 2020 | 32 | 202 | 0 | 2
2 | 2020 | 32 | 202 | 0 | 0
2 | 2020 | 32 | 202 | 0 | 2
3 | 3030 | 32 | 303 | 2 | 2
3 | 3030 | 32 | 303 | 2 | 3
3 | 3030 | 32 | 303 | 2 | 1
3 | 3030 | 32 | 303 | 2 | 1
3 | 3030 | 48 | 303 | 1 | 3
3 | 3030 | 48 | 303 | 1 | 2
3 | 3030 | 48 | 303 | 1 | 1
3 | 3030 | 48 | 303 | 1 | 1
(36 rows)
</code>
. postgresql-13.3
<code>
id | hex_a | bits_of_a | hex_b | bits_of_b | extraneous_bits_of_b
----+-------+-----------+-------+-----------+----------------------
1 | 1010 | 0 | 101 | 0 | 0
2 | 2020 | 0 | 202 | 2 | 2
1 | 1010 | 16 | 101 | 1 | 1
1 | 1010 | 16 | 101 | 1 | 1
3 | 3030 | 16 | 303 | 3 | 3
2 | 2020 | 32 | 202 | 0 | 0
2 | 2020 | 32 | 202 | 2 | 2
3 | 3030 | 32 | 303 | 2 | 2
3 | 3030 | 48 | 303 | 1 | 1
1 | 1010 | | 101 | | 1
3 | 3030 | | 303 | | 1
2 | 2020 | | 202 | | 0
(12 rows)
</code>