Re: parallel aggregation

David Rowley <dgrowleyml@xxxxxxxxx> · Wed, 12 Apr 2023 22:38:45 +1200

On Wed, 12 Apr 2023 at 22:14, Alexander Saydakov <saydakov@xxxxxxxxxxxx> wrote:
>
> I have a few questions regarding aggregate functions that would be parallel safe.
> 1. do the inputs of combinefunc always go through serialfunc-deserialfunc or they can come directly from sfunc (worker on the same machine, perhaps)?

Only aggregates with an INTERNAL transition state must be serialised
and deserialised.  Non-internal state aggregates i.e ones that have a
corresponding database type, can be pushed through the normal means
that we transfer tuples from parallel workers to the main process
without any serialisation or deserialisation at the other end.

All serial functions must return bytea and accept a single INTERNAL
parameter, so you can't even make a serial func for an aggregate
that's not got an INTERNAL aggregate state type.

> 2. can the result of combinefunc ever be fed to sfunc as opposed to other combiners or finalfunc?

combinefuncs take 2 STYPEs, so it's not valid to pass those to an
SFUNC (those are only given a BASETYPE to transition into the
aggregate state).  The finalfunc will be called (if it exists) during
the Finalize Aggregate plan node. The Finalize Aggregate node also
gathers intermediate aggregate states from parallel workers and calls
the combinefunc on ones from the same group, so yes, the finalfunc
will be called on aggregate states that have been combined with the
combinefunc.

> I have in mind a scenario, when a different data structure is used in the combine stage. For that it would be good if the conversion can happen in serialfunc-deserialfunc, and the combiner does not even know about the other structure used for state transition during aggregation. If that is the case, the only problem remains with the finalfunc. It has to be ready to receive both types.

What's the use case for that?

David