Hello,
I didn't try it, but following should be slightly
faster:
COUNT( CASE WHEN field >= x AND field < y THEN true
END)
intead of
intead of
SUM( CASE WHEN field >= x AND field < y THEN 1 ELSE 0
END)
HTH,
Marc
Mamin
From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Nikolas Everett
Sent: Thursday, October 22, 2009 4:48 AM
To: Doug Cole
Cc: pgsql-performance
Subject: Re: optimizing query with multiple aggregates
SELECT SUM(CASE
WHEN field >= 0 AND field < 10 THEN 1 ELSE 0 END) as
zeroToTen,
SUM(CASE WHEN
field >= 10 AND field < 20 THEN 1 ELSE 0 END) as
tenToTwenty,
SUM(CASE WHEN field >= 20 AND field < 30 THEN 1 ELSE 0
END) as tenToTwenty,
...
FROM
bigtable
My guess is this
forcing a whole bunch of if checks and your getting cpu bound. Could you
try something like:
SELECT SUM(CASE
WHEN field >= 0 AND field < 10 THEN count ELSE 0 END) as
zeroToTen,
SUM(CASE WHEN
field >= 10 AND field < 20 THEN count ELSE 0 END) as
tenToTwenty,
SUM(CASE WHEN field >= 20 AND field < 30 THEN count
ELSE 0 END) as tenToTwenty,
...
FROM (SELECT
field, count(*) FROM bigtable GROUP BY field)
which will allow a hash
aggregate? You'd do a hash aggregate on the whole table which should be
quick and then you'd summarize your bins.
This all supposes that
you don't want to just query postgres's column statistics.
On Wed, Oct 21, 2009 at 10:21 PM, Doug Cole <dougcole@xxxxxxxxx> wrote:
On Wed, Oct 21, 2009 at 5:39 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:Unless I'm misunderstanding you, that would require breaking each bin
>
> On Wed, Oct 21, 2009 at 6:51 PM, Doug Cole <dougcole@xxxxxxxxx> wrote:
> > I have a reporting query that is taking nearly all of it's time in aggregate
> > functions and I'm trying to figure out how to optimize it. The query takes
> > approximately 170ms when run with "select *", but when run with all the
> > aggregate functions the query takes 18 seconds. The slowness comes from our
> > attempt to find distribution data using selects of the form:
> >
> > SUM(CASE WHEN field >= x AND field < y THEN 1 ELSE 0 END)
> >
> > repeated across many different x,y values and fields to build out several
> > histograms of the data. The main culprit appears to be the CASE statement,
> > but I'm not sure what to use instead. I'm sure other people have had
> > similar queries and I was wondering what methods they used to build out data
> > like this?
>
> have you tried:
>
> count(*) where field >= x AND field < y;
>
> ??
>
> merlin
into a separate sql statement and since I'm trying to calculate more
than 100 bins between the different fields any improvement in the
aggregate functions would be overwhelmed by the cost of the actual
query, which is about 170ms.
Thanks,
Doug
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance