Assuming your questions as 1,2,3, please find my answers below.
1)"explain" on foreign servers run as "idle in transactions". coz they were running very long (in the order of some minutes) , pgbouncer (in tx level pooling) setting kill them (as idle in tx time limit exceeded of 5 mins) or else results in too many connections piling up.
2)yes, i get those from pg_stat_activity, it truncates the full statement, but it shows up as * EXPLAIN select col1, col2 .... * 00:00:44 | idle in transaction (this is just one of the screenshots i have). (on the foreign side)
3)yes, i think we kind of understood that part (fetch and memory), but i am not sure if that is used as any hint in plan generation too. i am sorry, i did not put auto explain on, on foreign servers, as that required a restart of the server.
(this is the real content of the screenshot ,yes 13 mins), masking the colname and viewname
20678 | 00:13:38.990025 | EXPLAIN SELECT cols from view | idle in transaction
the explain analyze of the same query on the foreign server is in ms.
I am sorry, i am vague about the queries in the email. i cannot reproduce it, as we do not have multiple shards of 500G in my qa environment and i cannot take dump of prod to test that in our test env coz of gdpr :)
but as i said in the mail, we were speculating since limit was not passed, the plans may have been bad. We tricked the foreign server by using a sample column to fake limit push down, and now have improved response times. We made vaccum/analyze very aggressive to ensure stats are never stale after large updates or deletes.
Unless someone can else reproduce, I guess, i'll close this mail. (I'll try to reproduce it myself again, but for now i have less data to share to convince anyone that happened.
Regards,
Vijay
On Sun, Feb 17, 2019 at 11:11 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Mon, Feb 4, 2019 at 2:15 AM Vijaykumar Jain <vjain@xxxxxxxxxxxxx> wrote:now we have some long running queries via FDW that take minutes and get killed explain runs as idle in transaction on remote servers.Are you saying the EXPLAIN itself gets killed, or execution of the plan generated based on the EXPLAIN (issued under use_remote_estimate = true) gets killed? Who is doing the killing, the local side or the foreign side? Can you include verbatim log entries for this?
explain on foreign servers run as "idle in transactions". coz they were running very long (in the order of some minutes) , pgbouncer setting kill them (as idle in tx time limit exceeded of 5 mins) or else results in too many connections piling up.
now the explain forselect * from sh01.view1 keeps running for minutes sometimes,then fetch too keeps running for minutes, although the total rows are < 10000 maybe.idle in transaction | FETCH 10000 FROM c1What is this? Is it from some monitoring tool, or pg_stat_activity, or what? And is it on the local side or the foreign side?
yes, pg_stat_activity, it truncates the full statement, but it shows up as * EXPLAIN select col1, col2 .... * 00:00:44 | idle in transaction (this is just one of the screenshots i have). (on the foreign side)
other questions:also, what is the cost of fetch_size?It will always fetch rows from the foreign server in this sized chunks. A larger fetch_size will have less network latency and computational overhead if many rows are going to be consumed, but also consume more memory on the local server as all rows are stored in memory per each chunk. Also, in the case of a LIMIT, it reads a large number of rows even if most of them may be unneeded. Conceptually, the LIMIT could be used to modify the FETCH downward to match the LIMIT, but that is not implemented. In the case of a view over UNION ALL, I don't think the individual subqueries even know what the global LIMIT is.
Yep, i guess that is where i think the plan may have
I mean given we have a queryselect * from foobar limit 10000; via FDWlimit 10000 does not get pushed.so it seems all rows some to FDW node and then limit is applied?It should not read all rows. It should read as many multiples of fetch_size as needed, which should just be 1 multiple in this case.
Yep, i think we kind of understood that part, but i am not sure if that is used to generate the plan too. i am sorry, i did not put auto explain on, on foreign servers, as that required a restart of the server.
Cheers,Jeff