Re: "Big O" notation for postgres?

Gregory Stark <stark@xxxxxxxxxxxxxxxx> · Thu, 22 May 2008 16:59:46 -0400

"Richard Huxton" <dev@xxxxxxxxxxxx> writes:

> Jonah H. Harris wrote:
>> On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@xxxxxxxxxxxxxx> wrote:
>>> Does anyone know if there is a source that provides "Big O" notation for
>>> postgres's aggregate functions and operations?  For example is count(*) =
>>> O(1) or O(n)?
>>
>> I don't know of any document containing the complexity of each
>> aggregate, but it's sometimes left as a comment in the souce code.
>
> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
> presence of an index too, just to muddy the waters.

Hm, true. But excluding those special cases all Postgres aggregate functions
will be O(n) unless they're doing something very odd. None of the built-in
functions (except min/max as noted) do anything odd like that.

The reason way is because of the basic design of Postgres aggregate functions.
They are fed every tuple one by one and have to keep their state in a single
variable. Most of the aggregate functions like count(*) etc just keep a static
non-growing state and the state transition function is a simple arithmetic
operation which is O(1). So the resulting operation is O(n).

Actually one exception would be something like

CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');

Since the state variable has to keep accumulating more and more stuff the
array_append becomes more and more expensive (it has to generate a new array
so it has to copy the existing stuff). So actually it woul dbe O(n^2).

The only builtin aggregate which looks like it falls in this category would be
xmlagg()

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!