Hello all,
I faced strange behavior of PostgreSQL during the query execution.
So, I have to databases: local and foreign. There are foreign server definitions in the local database (via postgres_fdw). The local database has table 'local_table'. The foreign database has table 'foreign_table'. Both of them have only 1 column: 'primary_uuid'. This column in both databases is a primary key column. Schema on a local server that stores remote server definitions is 'foreign_server'. Each table has 100K rows. Vacuum analyze has been run for both servers.
When I run a query:
SELECT *
FROM
(
SELECT foreign_table.primary_uuid
FROM foreign_server.foreign_table
UNION ALL
SELECT local_table.primary_uuid
FROM local_table
)
join_view
WHERE
join_view.primary_uuid in (select '19b2db7e-db89-48eb-90b1-0bd468a2346b'::uuid)
I expect that the server will use the pkey index for the local table. But it uses seq scan instead!
"Hash Semi Join (cost=100.03..3346.23 rows=51024 width=16) (actual time=482.235..482.235 rows=0 loops=1)"
" Output: foreign_table.primary_uuid"
" Hash Cond: (foreign_table.primary_uuid = ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid))"
" -> Append (cost=100.00..2510.68 rows=102048 width=16) (actual time=0.529..463.563 rows=200000 loops=1)"
" -> Foreign Scan on foreign_server.foreign_table (cost=100.00..171.44 rows=2048 width=16) (actual time=0.528..446.715 rows=100000 loops=1)"
" Output: foreign_table.primary_uuid"
" Remote SQL: SELECT primary_uuid FROM public.foreign_table"
" -> Seq Scan on public.local_table (cost=0.00..1829.00 rows=100000 width=16) (actual time=0.021..6.358 rows=100000 loops=1)"
" Output: local_table.primary_uuid"
" -> Hash (cost=0.02..0.02 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=1)"
" Output: ('ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid)"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> Result (cost=0.00..0.01 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=1)"
" Output: 'ef89a151-3eab-42af-8ecc-8850053aa1bb'::uuid"
"Planning Time: 0.126 ms"
"Execution Time: 482.572 ms""Execution Time: 509.315 ms"
So, as you can see, the execution time is 509 ms! It could be very fast if PostgreSQL used primary key index!
Also, please, note, that SQL without WHERE clause has been set to the foreign server:
" Remote SQL: SELECT primary_uuid FROM public.foreign_table"
So, the optimizer doesn't select optimal plans for such executions :(
Looks like it's an optimizer inadequacy.
Does someone know, how to optimize this query without query rewriting (queries like this are generated from the Data Access layer and it's hard to rebuild that layer)?
Thank you
P.S.: Answers to standard questions:
> PostgreSQL version number you are running:
PostgreSQL 11.2, compiled by Visual C++ build 1914, 64-bit
> How you installed PostgreSQL:
By downloaded standard Windows 64 installer
> Changes made to the settings in the postgresql.conf file: shared_preload_libraries = '$libdir/pg_stat_statements'
> Operating system and version:
Windows 10 Enterprise 64-bit
> What program you're using to connect to PostgreSQL:
pgAdmin III
> Is there anything relevant or unusual in the PostgreSQL server logs?:
Nope
P.P.S.: DDL scripts:
for the foreign database:
CREATE TABLE public.foreign_table
(
primary_uuid uuid NOT NULL,
CONSTRAINT "PKEY" PRIMARY KEY (primary_uuid)
)
for local database:
CREATE TABLE public.local_table
(
primary_uuid uuid NOT NULL,
CONSTRAINT local_table_pkey PRIMARY KEY (primary_uuid)
)
ᐧ