Jeff,
In my experience the overhead is directly related to the size and the complexity of the tables in the query.
A simple query only referencing a small < 5M table with only a primary key, it isn't noticeable.
A typical complicated query referencing numerous tables ranging in sizes from <5M to > 2.5T each with a primary key and 0 - 30 indices (per table), it can add 30 seconds to many minutes to each run of the query. When some of these queries are re-run constantly it becomes untenable. In one case, manually running analyze on the foreign tables and then running the query it returns in about 30 sec. Setting use_remote_estimate = true made each run return in about 5 minutes.
Unfortunately, manually running analyze on each foreign table in the schema (500+ tables, ranging from < 1M to > 3T) takes more than a day to complete. On the server hosting the tables, the auto analyze is running constantly, as expected. All of the clusters are sitting at PostgreSQL 15.
Hence my desire to find a more performant, less resource intensive way to pass the continuously updated statistics of these tables to the other PostgreSQL clusters holding the foreign table pointers to them.
I know it's anecdotal, but I hope it helps anyway.
rik.
On Thu, Aug 3, 2023 at 10:19 AM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Tue, Aug 1, 2023 at 9:47 AM richard coleman <rcoleman.ascentgl@xxxxxxxxx> wrote:use_remote_estimate isn't really a solution as it adds way too much overhead and processing time to every query run.Maybe this is the thing which should be addressed. Can you quantify what you see here? How much overhead is being added for each query? Is this principally processing time, or network latency?Since these tables are being continuously analyzed in the database that hosts the data, is there some way that they statistics could be easily passed through the foreign server mechanism to the remote database that's calling the query?Since FDW can cross version boundaries, it is hard to see how this would work. Maybe something could be done for the special case of where the versions match. I think collations/encoding would be a problem, though.What I am hoping for is either:2. add the ability to automatically run analyze on foreign tables just as they are currently run on local tables.That wouldn't work because communication is always initiated on the wrong side. But it should be fairly easy to script something outside of the database which would connect to both, and poll the "foreign" pg_stat_all_tables.last_autovacuum and initiate a local ANALYZE for each table which was recently autoanalyzed on the foreign side.Cheers,Jeff