Search Postgresql Archives

Re: Inconsistent performance with LIKE and bind variable on long-lived connection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Steven Grimm <sgrimm@xxxxxxxxxxxxxx> writes:
> That seems to fit the behavior. Thanks; I wasn't aware of that feature of
> prepared statements. I changed the Python code to do EXPLAIN ANALYZE
> EXECUTE rather than EXECUTE, and I do indeed see a change in plan after the
> fifth iteration:

> (first five iterations)

> Index Only Scan using test_pkey on test  (cost=0.29..476.29 rows=9999
> width=4) (actual time=0.058..2.439 rows=10000 loops=1)
>   Index Cond: (col1 = 'xyz'::text)
>   Filter: (col2 ~~ '%'::text)
>   Heap Fetches: 0
> Execution time: 2.957 ms

> (iterations 6+)

> Sort  (cost=205.41..205.54 rows=50 width=4) (actual time=104.986..105.784
> rows=10000 loops=1)
>   Sort Key: col2
>   Sort Method: quicksort  Memory: 853kB
>   ->  Seq Scan on test  (cost=0.00..204.00 rows=50 width=4) (actual
> time=0.014..2.100 rows=10000 loops=1)
>         Filter: ((col2 ~~ $2) AND (col1 = $1))
> Execution time: 106.282 ms

> So the problem here may be that the cost estimate for the generic execution
> plan is way off in the case of a LIKE bind variable that matches a large
> number of rows. I did make sure to have the Java code do a VACUUM ANALYZE
> after doing its inserts, just to eliminate lack of statistics as a possible
> explanation. Maybe the incorrect row count estimate (50 instead of 10000)
> is causing it to think the quicksort will be a lot cheaper than it ends up
> being with the actual rows?

Right.  The plancache code is designed to switch to a generic plan if
that doesn't seem to save anything compared to a custom plan that's
built for the specific parameter value(s).  But there's an implicit
assumption in that design that knowledge of the specific values can
only make the plan better not worse.  In this case, knowing the values
causes the planner to know that the plan will scan much more of the
table than its generic estimate would guess, so it ends up with a
higher cost estimate.  But that's not because the plan is "worse",
it's because the rowcount estimate is closer to reality.

I've seen this problem once or twice before, but I'm not entirely sure
what to do about it.  Disentangling the two effects seems hard.  One idea
is to reject the generic plan if it comes out cheaper than the average
custom plan, on the grounds that this must reflect less expensive (but
inaccurate) conclusions about the rowcount not a genuinely better plan.
But I have a feeling that that's too simplistic --- in particular I'm not
sure it does the right thing when the custom plan number is indeed an
average over several different custom plans.  (That is, if you were
throwing a mix of selective and unselective patterns at this query,
so that some of the custom plans were indexscans and some were seqscans,
I'm not very sure what would happen with a rule like that.)

> Interesting that the Java version switches to the suboptimal plan after 9
> iterations rather than 5. I don't know how to get the JDBC driver to do an
> EXPLAIN on a prepared statement, so I can't confirm that the same thing is
> happening there, but it seems plausible. Happy to try that if there's a way
> to do it.

Yeah, I've been watching this thread and trying to figure out how to
explain that part; I suspected a cause of this form but couldn't
make that theory match the 9-iterations observation.  (I still can't.)

			regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux