postgres_fdw chooses remote index scan too rarely

Jeremy Finzel <finzelj@xxxxxxxxx> · Wed, 19 Sep 2018 08:11:03 -0500

First of all, my local system which has postgres_fdw installed is this version:
PostgreSQL 9.6.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit

Here are my server configurations (I have tweaked these with no improvement to my issue):
fetch_size=100
fdw_startup_cost=1000
fdw_tuple_cost=100000
use_remote_estimate=true
I am seeing some really odd behavior and I'm not sure how I can most easily help our users to write queries that actually hit an index scan on the remote server.  In one example, I have:
a temp table with 33 rows
text instead of integer data type - which the remote table's data type is
it is freshly analyzed
Joining to the table, doing id IN (list), or doing EXISTS all yield a full remote table scan:

select *
from remote.customers c
inner join abc t on t.customer_id::int=c.customer_id;

select *
from remote.customers c
where c.customer_id in (SELECT customer_id::int FROM abc);

select *
from remote.customers c
where exists (SELECT 1 FROM abc WHERE customer_id::int = c.customer_id);

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Hash Join  (cost=1002.47..1170208178867.64 rows=5851034 width=3113)
   Hash Cond: (c.customer_id = (abc.customer_id)::integer)
   ->  Foreign Scan on customers c  (cost=1000.43..1170208089344.77 rows=11702069 width=902)
   ->  Hash  (cost=1.69..1.69 rows=28 width=7)
         ->  HashAggregate  (cost=1.41..1.69 rows=28 width=7)
               Group Key: (abc.customer_id)::integer
               ->  Seq Scan on abc  (cost=0.00..1.33 rows=33 width=7)
(7 rows)

However, I can get the index scan 2 ways:
converting the temp table data type to int
keeping the data type as is, but adding DISTINCT to the IN list
Neither of these make any sense to me because the planner knows there are only 33 rows in the temp table.  It should always do an index scan.  And why should converting the data type to int be any different than casting it to int?  I understand the planner stats on that field are not for an integer, but it's still only 33 rows.

So these 2 versions get a good plan and actually run very quickly:

select *
from remote.customers c
where c.customer_id in (SELECT DISTINCT customer_id::int FROM abc);

WITH distinctified AS (SELECT DISTINCT customer_id::int FROM abc)

select *
from remote.customers c
INNER JOIN distinctified t on t.customer_id::int=c.customer_id;

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Nested Loop  (cost=1002.03..5628140.32 rows=5851034 width=3113)
   ->  HashAggregate  (cost=1.58..2.00 rows=28 width=4)
         Group Key: (abc.customer_id)::integer
         ->  Seq Scan on abc  (cost=0.00..1.50 rows=33 width=4)
   ->  Foreign Scan on customers c  (cost=1000.45..201004.91 rows=2 width=902)
(5 rows)

Likewise if I alter the type it works without any special treatment:

ALTER TABLE abc ALTER COLUMN customer_id TYPE int USING customer_id::INT;
ANALYZE abc;

EXPLAIN
select *
from remote.customers c
inner join abc t using (customer_id);

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Nested Loop  (cost=1000.45..6633164.02 rows=1930841 width=3194)
   ->  Seq Scan on abc t  (cost=0.00..1.33 rows=33 width=85)
   ->  Foreign Scan on customers c  (cost=1000.45..201004.91 rows=2 width=902)
(3 rows)

Any insight appreciated!

Thanks,
Jeremy