Re: select distinct runs slow on pg 10.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think Merlin has outlined pretty much all the options and very neatly. (As an asides Merlin could you possibly elaborate on the "C Hack" how that might be accomplished.)

To OP, I am curious if the performance changes were the query rewritten such that all timestamp columns were listed first in the selection. I understand it might not be feasible to make this change in your real application without breaking the contract.

Regards
Dinesh

On Wed, Sep 11, 2019 at 8:54 AM Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
On Mon, Sep 9, 2019 at 3:55 AM yash mehta <yash215@xxxxxxxxx> wrote:
>
> We have a query that takes 1min to execute in postgres 10.6 and the same executes in 4 sec in Oracle database. The query is doing 'select distinct'. If I add a 'group by' clause, performance in postgres improves significantly and fetches results in 2 sec (better than oracle). But unfortunately, we cannot modify the query. Could you please suggest a way to improve performance in Postgres without modifying the query.

Well, here's the bad news.  Postgres doesn't optimize this specific
formulation as well as oracle does.  Normally tweaking the query along
with some creativity would get the expected result; it's pretty rare
that I can't coerce the planner to do something fairly optimally.  I'm
guessing this is an Oracle conversion app, and we do not have the
ability to change the underlying source code?  Can you elaborate why
not?

In lieu of changing the query in the application, we have  high level
strategies to consider.
*) Eat the 20 seconds, and gripe to your oracle buddies (they will
appreciate this)

*) Mess around with with planner variables to get a better plan.
Unfortunately, since we can't do tricks like SET before running the
query, the changes will be global, and I'm not expecting this to bear
fruit, unless we can have this query be separated from other queries
at the connection level (we might be able to intervene on connect and
set influential non-global planner settings there)

*) Experiment with pg11/pg12 to see if upcoming versions can handle
this strategy better.  pg12 is in beta obviously, but an upgrade
strategy would be the easiest out.

*) Attempt to intervene with views.  I think this is out, since all
the tables are schema qualified. To avoid a global change, the typical
strategy is to tuck some views into a private schema and manipulate
search_path to have them resolve first, but that won't work if you
don't have control of the query string.

*) Try to change the query string anyways.  Say, this is a compiled
application for which you don't have the code,  we might be able to
locate the query text within the compiled binary and modify it.  This
is actually a pretty effective trick (although in many scenarios we'd
want the query string to be the same length as before but you have
plenty of whitespace to play with) although in certain
legal/regulatory contexts we might not be able to do it.

*) Hack some C to adjust the query in flight.   This is *SUPER* hacky,
but let's say that the application was dynamically linked against the
libpq driver, but with some C change and a fearless attitude we could
adjust the query after it leaves the application but before it hits
the database. Other candidate interventions might be in the database
itself or in pgbouncer.  We could also do this in jdbc if your
application connects via that driver.  This is would be 'absolutely
last resort' tactics, but sometimes you simply must find a solution.

merlin



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux