Re: Improving Query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Oct 30, 2007, at 9:23 AM, Richard Huxton wrote:

Ketema wrote:
I have the following query that is a part of a function:

Yikes! Difficult to get a clear view of what this query is doing.
It seems complicated because you only have a small subset of the schema these tables tie into.
Be happy to share the whole thing, if it is needed.

OK, I'm assuming you're vacuumed and analysed on all these tables...
Yes.  Auto-vacuum is on and do a Full vacuuum every 2 days.


My concern is with the sort step that takes 15 seconds by itself:
-> Sort (cost=1235567017.53..1238002161.29 rows=974057502 width=290)
(actual time=16576.997..16577.513 rows=3366 loops=1)

That's taking hardly any time, the startup time is 16576.997 already. Of course, the row estimate is *way* out of line.
OK. I misread the plan and took start up time as the time it took to perform operation. Thanks for the link to explain analyze.

If you look here (where the explain is a bit easier to see)
http://explain-analyze.info/query_plans/1258-query-plan-224

The two main things to look at seem to be the nested loops near the top and a few lines down the materialise (cost=326...

These two nested loops seem to be pushing the row estimates wildly out of reality. They also consume much of the time.

The immediate thing that leaps out here is that you are trying to join an int to an array of ints. Why are you using this setup rather than a separate table?
I see what you are talking about. When I initially used this set up it was because I wanted to avoid a table that had a ton of rows in it that I knew I would have to join to often. So I made a column that holds on average 4 or 5 ints representing "products" on a particular "order". I did not realize that using a function in the join would be worse that simply having a large table.

How can I improve this step?
Things I have thought about:
1)Creating indexes on the aggregates...Found out this can't be done.

Nope - not sure what it would mean in any case.
My initial thought was the counts were causing the slow up. THis is not the issue though as you have shown.

2)Create Views of the counts and the sub select...is this any faster
as the view is executed at run time anyway?

Might make the query easier to write, won't make it faster. Not without materialised views which are the fancy name for #3...

3)Create actual tables of the sub select and aggregates...How would
this be maintained to ensure it was always accurate?

Triggers.
Because of the use of this system I may take this route as I think it will be less changes.

4)Increasing hardware resources.  Currently box is on a single
processor amd64 with 8Gb of RAM.  below are the settings for resource
usage.
shared_buffers = 65536
temp_buffers = 5000
max_prepared_transactions = 2000

????
These are settings out of postgresql.conf Currently systctl.conf is set to kernel.shmmax = 805306368
connections are at 300 and I usually have about 200 connections open.

work_mem = 131072
maintenance_work_mem = 512000

Can't say about these without knowing whether you've got only one connection or 100.

max_stack_depth = 7168
max_fsm_pages = 160000
max_fsm_relations = 4000
The only function of this box if for Pg, so I do not mind it using
every last drop of ram and resources that it can.
5)Upgrade version of pg..currently is running 8.1.4

Well every version gets better at planning, so it can't hurt.
At one point I did go to 8.2.3 on a dev box and performance was horrible. Have not had opportunity to see how to make postgresql.conf file in 8.2 match settings in 8.1 as some things have changed.

--
  Richard Huxton
  Archonet Ltd


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux