Hi Ingolf,
Thanks for the additional info. I tried to replicate this, you can find the GitHub Gist at [1], happy to hear your feedback about it.
What I can see from the execution plans in my example is, that the postgres_fdw pushes down that part
Remote SQL: SELECT id_a FROM public.a WHERE ((id_b = 1))
part to the remote. On the remote DB this query can result in either an index-only scan, an index scan or a full table scan. Which method is chosen depends on table size and indexes. Given the nature of postgres_fdw this will be done in any case with a
sequential query. For this part I would claim:
- If the planner expects few rows, it will choose an index-lookup which is a good thing because it effectively reduces the amount of data that needs to be queried. This would make it fast, given this is a sequential scan.
- If the planner expects many rows, it might choose a sequential scan which can be slow depending on the overall size of the table and likely whether it is cached or not.
So, I found the "size" of the query has a serious impact to the execution time. I don't really understand why execution 16*50 takes 16*2 secs only, but executing 1*800 takes about 3000 seconds... The mentioned fetch_size parameter has a positive effect, because one can grab many more rows and return them at the same time. Worst case (and this is just pure assumption), on each new fetch, the query might be re-executed and thus runtime becomes much
more.
Further up in the plan, I see
Remote SQL: SELECT id_a, id_b FROM public.a
which is the "SELECT * FROM my_view" part. Meaning, here it will definitely do a full table scan on remote since it cannot push down the IN condition. I don't really see right now why this query at all is slower than your original form. In my experiment
it is not, but maybe I am doing something wrong in the schema.
One thought would be however, that the full table scan on the remote is more efficient than pushing down the filter and thus it returns faster.
To really figure out more, I would suggest to increase the logging level on your remote server in order to see which queries are really executed. Even better to maybe use auto_explain to fetch plans and see whether these claims apply.
Best,
Sebastian
|