Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Reviving a thread from two months ago...Francisco asked about a query that was much slower in 8.3 at cone.1288183283.263738.5695.1000@shelca">http://archives.postgresql.org/message-id/cone.1288183283.263738.5695.1000@shelca

There was some theorizing about a stats problem. I've now pulled pg_stats data from the production server, and that alone doesn't seem to be enough to account for what's hapenning. Robert asked a good question at http://archives.postgresql.org/message-id/AANLkTi=VotmtMz5tYcnQ_5nDhZmsvp_9oYq43FF+qjnn@xxxxxxxxxxxxxx that never was followed up on. That specificially is what I'm still confused about, even after collecting a lot more data from this system, so continuing from there:

Robert Haas wrote:
This part looks really strange to me.  Here we have a nested loop
whose outer side is estimated to produce 33 rows and whose [inner] side
is estimated to produce 2 rows.  Given that, one would think that the
estimate for the loop as a whole shouldn't be more than 33 * 2 = 66
rows (or maybe a bit more if 33 is really 33.4999 rounded down, and 2
is really 2.49999 rounded down).   But the actual estimate is 5 orders
of magnitude larger. How is that possible?

That part stuck out to me too. I have no idea how that particular bit of Nested Loop ends up with that giant count for estimated output rows. I thought it would be about 33 * 2 too, so how is that turning into an estimate of 2434522 output rows? I believe this is the part that's executing "cards.useraccount IN".

Here's a cut-down subset of the query that just does the suspicious part without any nodes above it, run on the 8.4 system having the problem. This is basically the same plan that was seen in a lower-level node before, just the number of matching rows for the one loop has gone from 33 to 34 due to more data being added to the table since the previous run. I think it's easier to follow the logic it's trying to execute when simplified like this:

SELECT
 members.membid
FROM cards
WHERE cards.useraccount IN
   (SELECT useraccount FROM cards WHERE membid IN
     (SELECT membid from members where commonid = 3594)
   )
;

Nested Loop (cost=303.01..742.28 rows=2469120 width=4) (actual time=0.066..0.082 rows=2 loops=1) -> HashAggregate (cost=303.01..303.35 rows=34 width=29) (actual time=0.048..0.048 rows=2 loops=1) -> Nested Loop (cost=11.86..302.93 rows=34 width=29) (actual time=0.034..0.045 rows=2 loops=1) -> HashAggregate (cost=11.86..12.20 rows=34 width=4) (actual time=0.023..0.024 rows=2 loops=1) -> Index Scan using members_commonid on members (cost=0.00..11.77 rows=34 width=4) (actual time=0.014..0.016 rows=2 loops=1)
                         Index Cond: (commonid = 3594)
-> Index Scan using cards_membid on cards (cost=0.00..8.54 rows=1 width=33) (actual time=0.009..0.010 rows=1 loops=2)
                   Index Cond: (public.creditcards.membid = members.membid)
-> Index Scan using cards_useraccount on cards (cost=0.00..12.88 rows=2 width=33) (actual time=0.015..0.016 rows=1 loops=2)
       Index Cond: (public.cards.useraccount = public.cards.useraccount)

It's possible to rewrite this whole thing using a join instead of IN, and sure enough doing so gives a better plan. That's how they got around this being a crippling issue. I'm still suspicious of what caused such a performance regression from 8.3 though, where this query executed so much better.

Stepping back from that part of the query for a second, the main time related difference between the 8.3 and 8.4 plans involves how much of the members table gets scanned. When 8.3 looks up a matching item in that table, in order to implement this part of the larger query:

WHERE members.membid = cards.membid AND members.membershipid = 40

It uses the membid index and gets a quick plan, followed by filtering on membershipid:

-> Index Scan using members_pkey on members (cost=0.00..0.70 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=2)
     Index Cond: (public.members.membid = public.cards.membid)
     Filter: (public.members.membershipid = 40)

8.4 is scanning the whole table instead:

-> Seq Scan on members (cost=0.00..121379.95 rows=2434956 width=4) (actual time=0.024..1085.143 rows=2435153 loops=1)
     Filter: (membershipid = 40)

Which gives you essentially every single member ID available, to then match against in a Hash join. The filter on membershipid isn't considered selective at all. I'm not sure why 8.4 isn't also recognizing the value of being selective on the membid here, to reduce the number of output rows that come out of that.

Is the mis-estimation of the Nested Loop part causing this sequential scan to happen, because there are so many more potential values to join against in the estimate than in reality? If that's the case, it just comes full circle back to how the node already discussed above is coming about.

While there are some statistics anonomlies due to data skew on the production system I don't see how they could explain this Nested Loop row explosion. I can tell you in detail why some of the lower-level data is misestimated by a single order of magnitude. For example, if you focus on this inner part:

SELECT useraccount FROM cards WHERE membid IN
 (SELECT membid from members where commonid = 3594));

-> Nested Loop (cost=11.33..293.71 rows=33 width=29) (actual time=145.940..169.543 rows=2 loops=1) -> HashAggregate (cost=11.33..11.66 rows=33 width=4) (actual time=64.730..64.732 rows=2 loops=1) -> Index Scan using members_commonid on members (cost=0.00..11.25 rows=33 width=4) (actual time = 64.688..64.703 rows=2 loops=1)
                 Index Cond: (commonid = 3594)
-> Index Scan using cards_membid on cards (cost=0.00..8.53 rows=1 width=33) (actual time= 52.400..52.401 rows=1 loops=2)
           Index Cond: (public.cards.membid = public.members.membid)

The index scan on members_commonid here is estimating 33 rows when there are actually 2 that match. Looking at the table stats for this relation, the distribution is a bit skewed because 99.7% of the rows are set to the sole named MCV: the value "0", that's used as a flag for no data here instead of NULL (that's a standard legacy system import compatibility issue). My guess is that the 250 points of histogram data aren't quite enough to really capture the distribution of the non-zero values very well in the wake of that, so it's off by a factor of ~16. That alone isn't enough of an error to switch any of the efficient index scans into other forms though. The actual runtime in this part of the plan isn't suffering that badly from this error, it's more that other plan decisions aren't being made well around it.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux