RE: Same query 10000x More Time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kyotaro Horiguchi and Vijaykumar Jain,

Thanks for your quick reply!

I understand that the fact the slow query has a join caused this problem.  However, why can't Postgres evaluate the table of the "IN" clause (select 140 as id union select 144  union select 148) and based on its size decide what is more optimal.
Push the local table to the linked server to perform the join on the linked server
Pull the linked server table to local to perform the join on the local.

In my case the table size of the local is million times smaller than the table size of the remote.



select lnk.*
into local_1
from tbl_link lnk
where id in (select 140 as id union select 144  union select 148)


-----Original Message-----
From: Kyotaro Horiguchi [mailto:horikyota.ntt@xxxxxxxxx]
Sent: Thursday, January 6, 2022 11:39 AM
To: vijaykumarjain.github@xxxxxxxxx
Cc: Avi Weinberg <AviW@xxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxx
Subject: Re: Same query 10000x More Time

At Thu, 6 Jan 2022 13:50:55 +0530, Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> wrote in
> On Thu, 6 Jan 2022 at 13:13, Avi Weinberg <AviW@xxxxxxxxx> wrote:
>
> > Hi
> >
> >
> >
> > I have postgres_fdw table called tbl_link.  The source table is 2.5
> > GB in size with 122 lines (some lines has 70MB bytea column, but not
> > the ones I select in the example)
> >
> > I noticed that when I put the specific ids in the list "where id in
> > (140,144,148)" it works fast (few ms), but when I put the same list
> > as select "where id in (select 140 as id union select 144  union select 148)"
> > it takes 50 seconds.  This select union is just for the example, I
> > obviously have a different select (which by itself takes few ms but
> > cause the whole insert query to take 10000x more time)
> >
> >
> >
> > Why is that?  How can I still use regular select and still get
> > reasonable response time?
> >
> >
> >
> > Thanks
> >
> >
> >
>
> couple of things:
> PostgreSQL: Documentation: 14: F.35. postgres_fdw
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .postgresql.org%2Fdocs%2Fcurrent%2Fpostgres-fdw.html&amp;data=04%7C01%
> 7Caviw%40gilat.com%7Cc8585d2ddbeb4a09e3e208d9d0f8684c%7C7300b1a3573a40
> 1092a61c65cd85e927%7C0%7C0%7C637770587595033327%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C3000&amp;sdata=bVBCIOkXrVkkI%2BDH44QmAZmm%2FJLz%2FWYp5Wda%2FrJRfDA%3D
> &amp;reserved=0>
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .postgresql.org%2Fdocs%2Fcurrent%2Fpostgres-fdw.html&amp;data=04%7C01%
> 7Caviw%40gilat.com%7Cc8585d2ddbeb4a09e3e208d9d0f8684c%7C7300b1a3573a40
> 1092a61c65cd85e927%7C0%7C0%7C637770587595033327%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=bVBCIOkXrVkkI%2BDH44QmAZmm%2FJLz%2FWYp5Wda%2FrJRfDA%3D&amp;reserved=0>when you set your foreign server what are your use_remote_estimate fetch_size params for the foreign server.
>
> you need to know there are certain restrictions on what gets pushed
> down to the remote server i generally use postgres/postgres_fdw.sql at
> master * postgres/postgres
> (github.com)
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> hub.com%2Fpostgres%2Fpostgres%2Fblob%2Fmaster%2Fcontrib%2Fpostgres_fdw
> %2Fsql%2Fpostgres_fdw.sql&amp;data=04%7C01%7Caviw%40gilat.com%7Cc8585d
> 2ddbeb4a09e3e208d9d0f8684c%7C7300b1a3573a401092a61c65cd85e927%7C0%7C0%
> 7C637770587595033327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=TzqeuCMrThZ
> RUkq9m%2F97N8bRgm9wu3VFjTnoZpt%2BA7w%3D&amp;reserved=0>
> as
> a reference
> if you predicates are not pushed down, it will bring all the rows from
> the foreign server to your local server (and fetch_size value and
> network io will add to delay) and given you used select * , it will be
> a lot of io, so maybe restrict only to columns needed after being
> filtered would help.
>
>
> you can try by running
> explain (verbose,analyze) query  and then also enabling log_statement
> = 'all' / log_min_duration_statement = 0 on the foreign server to see
> the actual plan for the foreign scan.
>
> That might help in trouble shooting.
>
>
> as always, i have little production exposure. If i am wrong, i can be
> corrected.

In this specific case, the FAST query doesn't contain a join and its predicate can be pushed down to remote.  On the other hand the SLOW one contains a join.  The planner considers remote join only when the both hands of a join are on the same foreign server.  Tthis is not the case since the inner subquery is not even a foreign scan.  The planner doesn't consider the possibility that a subquery is executable anywhere.

As the result, the local inevitably draw all rows from remote table to join with the result of the subquery on-local, which should be quite slow.

It could be improved, but I don't think we are going to consider that case because the SLOW query seems like a kind of bad query, which can be improved by rewriting to the FAST one.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux