Re: Bad plan choice nestloop vs. hashjoin

Kenneth Marshall <ktm@xxxxxxxx> · Mon, 18 Jan 2010 12:57:44 -0600

On Mon, Jan 18, 2010 at 12:13:24PM -0500, Tom Lane wrote:
> Kenneth Marshall <ktm@xxxxxxxx> writes:
> > We have just upgraded our monitoring server software and
> > now the following query for graphing the data performs
> > abysmally with the default settings. Here is the results
> > of the EXPLAIN ANALYZE run with nestloops enabled:
> 
> I poked at this a bit more and now think I see where the problem is.
> The thing that would be easiest for you to do something about is
> the misestimation here:
> 
> >                            ->  Nested Loop Anti Join  (cost=94.07..10294.64 rows=1 width=8) (actual time=98.049..27907.702 rows=281 loops=1)
> >                                  Join Filter: (gii.graphid = g.graphid)
> >                                  ->  Bitmap Heap Scan on graphs g  (cost=94.07..233.17 rows=1 width=8) (actual time=0.529..1.772 rows=281 loops=1)
> >                                        Recheck Cond: (graphid = ANY ('{2,3,4,5,386,387,969,389,971,972,973,446,447,448,449,450,451,471,456,470,473,477,472,474,475,476,478,479,480,481,482,483,484,459,614,655,658,645,490,492,489,493,496,495,498,497,499,501,500,502,974,558,559,562,566,563,564,565,567,568,569,570,571,535,572,573,534,536,538,539,540,541,542,543,544,545,537,546,547,548,552,553,554,555,556,549,550,551,557,577,578,579,580,574,576,581,835,587,588,589,590,560,561,836,591,592,593,594,595,827,389,495,498,497,597,598,599,975,978,999,1004,604,605,606,679,616,634,635,636,637,638,618,629,630,631,632,633,671,682,669,670,678,679,680,674,672,676,673,675,677,681,682,683,683,644,652,829,681,687,698,685,686,705,706,707,708,830,945,946,710,716,712,714,713,709,718,721,720,719,723,724,747,749,750,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,772,774,775,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,777,776,977,824,823,826,825,829,832,833,835,581!
>  ,836,842,852,854,839,840,838,853,855,847,848,944,846,859,850,899,901,902,903,864,865,866,867,976,979,939,941,942,943,906,907,908,909,910,868,969,991,950,955,964,966,952,953,962,965,967,959,961,968,1001,1002,1003,986,987,988,994,995,996,1008,1006,1007,1009,1010}'::bigint[]))
> >                                        Filter: ((graphid / 100000000000000::bigint) = 0)
> >                                        ->  Bitmap Index Scan on graphs_pkey  (cost=0.00..94.07 rows=246 width=0) (actual time=0.507..0.507 rows=294 loops=1)
> >                                              Index Cond: (graphid = ANY ('{2,3,4,5,386,387,969,389,971,972,973,446,447,448,449,450,451,471,456,470,473,477,472,474,475,476,478,479,480,481,482,483,484,459,614,655,658,645,490,492,489,493,496,495,498,497,499,501,500,502,974,558,559,562,566,563,564,565,567,568,569,570,571,535,572,573,534,536,538,539,540,541,542,543,544,545,537,546,547,548,552,553,554,555,556,549,550,551,557,577,578,579,580,574,576,581,835,587,588,589,590,560,561,836,591,592,593,594,595,827,389,495,498,497,597,598,599,975,978,999,1004,604,605,606,679,616,634,635,636,637,638,618,629,630,631,632,633,671,682,669,670,678,679,680,674,672,676,673,675,677,681,682,683,683,644,652,829,681,687,698,685,686,705,706,707,708,830,945,946,710,716,712,714,713,709,718,721,720,719,723,724,747,749,750,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,772,774,775,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,777,776,977,824,823,826,825,829,832,833,835!
>  ,581,836,842,852,854,839,840,838,853,855,847,848,944,846,859,850,899,901,902,903,864,865,866,867,976,979,939,941,942,943,906,907,908,909,910,868,969,991,950,955,964,966,952,953,962,965,967,959,961,968,1001,1002,1003,986,987,988,994,995,996,1008,1006,1007,1009,1010}'::bigint[]))
> 
> The estimate of the ANY condition is not too bad (246 vs 294 actual).  But it
> hasn't got any ability to deal with the "(graphid / 100000000000000::bigint) = 0"
> filter condition, and is falling back to a default selectivity estimate for
> that, which IIRC is just 0.005 --- but actually, that condition doesn't
> eliminate any rows at all.  Do you need that condition in the first
> place?  Can you persuade your client-side software to eliminate it when
> it's impossible based on the ANY list?  Or at least recast it to
> something more easily estimatable, like "graphid < 100000000000000"?
> 
> If you really have to have the condition just like that, I'd advise
> creating an index on "(graphid / 100000000000000::bigint)".  That would
> cause ANALYZE to accumulate statistics on that expression, which'd
> result in a far better estimate.
> 
> The reason that this misestimate hammers it so hard is that the
> inside of the nestloop looks like
> 
> >                                  ->  Nested Loop  (cost=0.00..17449.43 rows=1954 width=8) (actual time=99.304..99.304 rows=0 loops=281)
> >                                        ->  Index Scan using graphs_items_2 on graphs_items gii  (cost=0.00..69.83 rows=1954 width=16) (actual time=0.013..3.399 rows=1954 loops=281)
> >                                        ->  Index Scan using items_pkey on items ii  (cost=0.00..8.88 rows=1 width=8) (actual time=0.046..0.046 rows=0 loops=549074)
> >                                              Index Cond: (ii.itemid = gii.itemid)
> >                                              Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
> >                                              SubPlan 1
> >                                                ->  Nested Loop  (cost=0.00..7.83 rows=1 width=0) (actual time=0.040..0.040 rows=0 loops=549074)
> >                                                      Join Filter: (rr.groupid = ugg.usrgrpid)
> >                                                      ->  Nested Loop  (cost=0.00..6.67 rows=1 width=8) (actual time=0.037..0.037 rows=0 loops=549074)
> >                                                            Join Filter: (hgg.groupid = rr.id)
> >                                                            ->  Index Scan using hosts_groups_1 on hosts_groups hgg  (cost=0.00..4.27 rows=1 width=8) (actual time=0.003..0.005 rows=1 loops=549074)
> >                                                                  Index Cond: ($0 = hostid)
> >                                                            ->  Seq Scan on rights rr  (cost=0.00..2.39 rows=1 width=16) (actual time=0.027..0.027 rows=0 loops=532214)
> >                                                                  Filter: (rr.permission < 2)
> >                                                      ->  Seq Scan on users_groups ugg  (cost=0.00..1.15 rows=1 width=8) (never executed)
> >                                                            Filter: (ugg.userid = 20)
> >                                              SubPlan 2
> >                                                ->  Nested Loop  (cost=2.34..8.13 rows=1 width=8) (never executed)
> >                                                      ->  Nested Loop  (cost=0.00..3.55 rows=1 width=8) (never executed)
> >                                                            Join Filter: (rr.groupid = ugg.usrgrpid)
> >                                                            ->  Seq Scan on rights rr  (cost=0.00..2.39 rows=1 width=16) (never executed)
> >                                                                  Filter: (permission < 2)
> >                                                            ->  Seq Scan on users_groups ugg  (cost=0.00..1.15 rows=1 width=8) (never executed)
> >                                                                  Filter: (ugg.userid = 20)
> >                                                      ->  Bitmap Heap Scan on hosts_groups hgg  (cost=2.34..4.45 rows=11 width=16) (never executed)
> >                                                            Recheck Cond: (hgg.groupid = rr.id)
> >                                                            ->  Bitmap Index Scan on hosts_groups_2  (cost=0.00..2.33 rows=11 width=0) (never executed)
> >                                                                  Index Cond: (hgg.groupid = rr.id)
> 
> The alternative subplans are variant implementations of the inner EXISTS
> test.  Unfortunately, it's choosing to go with the "retail" lookup, not
> knowing that that's going to wind up being executed 549074 times.
> If it had gone to the "wholesale" hashtable implementation, this would
> probably have run a whole lot faster.  (We have it in mind to allow the
> executor to switch to the other implementation on-the-fly when it
> becomes clear that the planner misjudged the rowcount, but it's not done
> yet.)
> 
> However, I'm curious about your statement that this used to perform
> better.  Used to perform better on what, and what was the plan back
> then?  I don't believe that pre-8.4 PG would have done better than
> 8.4 on either of these points.  It certainly wouldn't have won on the
> EXISTS subplan, because before 8.4 that would always have been done in
> the "retail" style.
> 
> 			regards, tom lane
> 

Hi Tom,

Your update beat me. I was just about to send you the trimmed example.
The performance problem was not present previously because this code
is new in this release, a .0 release. This section of code is completely
new. I will post a bug report to the developers with your analysis
and hopefully they can include the fix in the .1 release. I am going
to create the index so I do not have to track source code changes
to their application. The performance is still 2X slower than with
enable_nestloop set to off, but that is great compared to 200X slower
without the index:

CREATE INDEX fix_bad_stat_estimate ON graphs ((graphid / 100000000000000::bigint));

Thank you again.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance