Re: experiments in query optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




[If Kevin Grittner reads this, please fix your email address. I am getting bounces from your email address.]

On Tue, 30 Mar 2010, Robert Haas wrote:

On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <faheem@xxxxxxxxxxxxx> wrote:
Sure, but define sane setting, please. I guess part of the point is that I'm
trying to keep memory low, and it seems this is not part of the planner's
priorities. That it, it does not take memory usage into consideration when
choosing a plan. If that it wrong, let me know, but that is my
understanding.

I don't understand quite why you're confused here.  We've already
explained to you that the planner will not employ a plan that uses
more than the amount of memory defined by work_mem for each sort or
hash.

Typical settings for work_mem are between 1MB and 64MB.  1GB is enormous.

I don't think I am confused. To be clear, when I said "it does not take memory usage into consideration' I was talking about overall memory usage. Let me summarize:

The planner will choose the plan with the minimum total cost, with the constraint that the number of memory used for each of certain steps is less than work_mem. In other words with k such steps it can use at most

k(plan)*work_mem

memory where k(plan) denotes that k is a function of the plan. (I'm assuming here that memory is not shared between the different steps). However, k(plan)*work_mem is not itself bounded. I fail to see how reducing work_mem significantly would help me. This would mean that the current plans I am using would likely be ruled out, and I would be left with plans which, by definition, would have larger cost and so longer run times. The current runtimes are already quite long - for the PED query, the best I can do with work_mem=1 GB is 2 1/2 hrs, and that is after splitting the query into two pieces.

I might actually be better off *increasing* the memory, since then the planner would have more flexibility to choose plans where the individual steps might require more memory, but the overall memory sum might be lower.

You might need to create some indices, too.

Ok. To what purpose? This query picks up everything from the
tables and the planner does table scans, so conventional wisdom
and indeed my experience, says that indexes are not going to be so
useful.

There are situations where scanning the entire table to build up a
hash table is more expensive than using an index.  Why not test it?

Certainly, but I don't know what you and Robert have in mind, and I'm not
experienced enough to make an educated guess. I'm open to specific
suggestions.

Try creating an index on geno on the columns that are being used for the join.

Ok, I'll try that. I guess the cols in question on geno are idlink_id and anno_id. I thought that I already had indexes on them, but no. Maybe I had indexes, but removed them.

If I understand the way this works, if you request, say an INNER JOIN, the planner can choose different ways/algorithms to do this, as in http://en.wikipedia.org/wiki/Join_(SQL)#Nested_loops . It may choose a hash join, or an nested loop join or something else, based on cost. If the indexes don't exist that may make the inner loop join more expensive, so tip the balance in favor of using a hash join. However, I have no way to control which option it chooses, short of disabling eg. the hash join option, which is not an option for production usage anyway. Correct?

                                                          Regards, Faheem.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux