Re: Any better plan for this query?..

Dimitri <dimitrik.fr@xxxxxxxxx> · Tue, 12 May 2009 00:46:53 +0200

Hi Aidan,

thanks a lot for this detailed summary!

So, why I don't use prepare here: let's say I'm testing the worst
stress case :-)  Imagine you have thousands of such kind of queries -
you cannot prepare all of them! :-)  or you'll maybe prepare it once,
but as I showed previously in this thread prepare statement itself
takes 16ms, so for a single shot there is no gain! :-)  Stressing with
such kind of short and simple queries (and again, they have joins, it
may be even more simple :-)) will give me a result to show with
guarantee my worst case - I know then if I have to deploy a bombarding
OLTP-like application my database engine will be able to keep such
workload, and if I have performance problems they are inside of
application! :-)  (well, it's very simplistic, but it's not far from
the truth :-))

Now, as you see from your explanation, the Part #2 is the most
dominant - so why instead to blame this query not to implement a QUERY
PLANNER CACHE??? - in way if any *similar* query is recognized by
parser we simply *reuse* the same plan?..

Rgds,
-Dimitri

On 5/11/09, Aidan Van Dyk <aidan@xxxxxxxxxxx> wrote:
> * Dimitri <dimitrik.fr@xxxxxxxxx> [090511 11:18]:
>> Folks, it's completely crazy, but here is what I found:
>>
>> - if HISTORY table is analyzed with target 1000 my SELECT response
>> time is jumping to 3ms, and the max throughput is limited to 6.000 TPS
>> (it's what happenned to 8.3.7)
>>
>> -if HISTORY table is analyzed with target 5 - my SELECT response time
>> is decreasing to 1.2ms (!)  and then my max TPS level is ~12.000 !
>> and CPU is used up to 95% even by 8.3.7 :-)  and 8.4 performed better
>> just because I left its analyze target to default 100 value.
>>
>> Anyone may explain me why analyze target may have so huge negative
>> secondary effect?..
>
> It's actually pretty straight forward.
>
> The PostgreSQL query planner is a "smart planner".  It takes into
> consideration all the statistics available on the columns/tables,
> expected outputs based on inputs, etc, to choose what it thinks will be
> the best plan.  The more data you have in statistics (the larger
> statistics target you have), the more CPU time and longer it's going to
> take to "plan" your queries.  The tradeoff is hopefully better plans.
>
> But, in your scenario, where you are hitting the database with the
> absolute worst possible way to use PostgreSQL, with small, repeated,
> simple queries, you're not getting the advantage  of "better" plans.  In
> your case, you're throwing absolutely simple queries at PG as fast as
> you can, and for each query, PostgreSQL has to:
>
> 1) Parse the given "query string"
> 2) Given the statistics available, plan the query and pick the best one
> 3) Actually run the query.
>
> Part 2 is going to dominate the CPU time in your tests, more so the more
> statistics it has to evaluate, and unless the data has to come from the
> disks (i.e. not in shared buffers or cache) is thus going to dominate the
> time before you get your results.  More statistics means more time
> needed to do the planning/picking of the query.
>
> If you were to use prepared statements, the cost of #1 and #2 is done
> once, and then every time you throw a new execution of the query to
> PostgreSQL, you get to just do #3, the easy quick part, especially for
> small simple queries where all the data is in shared buffers or the cache.
>
> a.
>
> --
> Aidan Van Dyk                                             Create like a god,
> aidan@xxxxxxxxxxx                                       command like a king,
> http://www.highrise.ca/                                   work like a slave.
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance