Re: Indices and Foreign Tables

"Kohler Manuel (ID SIS)" <manuel.kohler@xxxxxxxxxx> · Fri, 4 Apr 2014 15:14:44 +0000

Hi,
here are the Query plans. The first plan is on the source database directly. So no fdw involved:

source_db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as "raw_size",pg_size_pretty(sum(dsf.size_in_bytes)) as "size" from data_set_files dsf, data_sets ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc;
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=341248.80..341281.56 rows=13103 width=34) (actual time=695.519..724.286 rows=13839 loops=1)
   Sort Key: (sum(dsf.size_in_bytes))
   Sort Method: quicksort  Memory: 2283kB
   ->  HashAggregate  (cost=340188.93..340352.71 rows=13103 width=34) (actual time=536.229..615.115 rows=13839 loops=1)
         ->  Nested Loop  (cost=0.56..340057.90 rows=13103 width=34) (actual time=0.104..456.139 rows=13839 loops=1)
               ->  Seq Scan on data_sets ds  (cost=0.00..385.39 rows=13839 width=34) (actual time=0.022..40.113 rows=13839 loops=1)
               ->  Index Scan using data_set_files_dase_id_parent_id_idx on data_set_files dsf  (cost=0.56..24.43 rows=11 width=16) (actual time=0.015..0.019 rows=1 loops=13839)
                     Index Cond: (((dase_id)::bigint = ds.id) AND (parent_id IS NULL))
 Total runtime: 752.695 ms
(9 rows)

Here is the same query with fdw:

db=# EXPLAIN VERBOSE select ds.code, count(*), sum(dsf.size_in_bytes) as "raw_size",pg_size_pretty(sum(dsf.size_in_bytes)) as "size" from data_set_files_fdw dsf, data_sets_fdw ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Sort  (cost=327.81..327.97 rows=64 width=40)
   Output: ds.code, (count(*)), (sum(dsf.size_in_bytes)), (pg_size_pretty(sum(dsf.size_in_bytes)))
   Sort Key: (sum(dsf.size_in_bytes))
   ->  HashAggregate  (cost=325.09..325.89 rows=64 width=40)
         Output: ds.code, count(*), sum(dsf.size_in_bytes), pg_size_pretty(sum(dsf.size_in_bytes))
         ->  Hash Join  (cost=270.61..324.45 rows=64 width=40)
               Output: dsf.size_in_bytes, ds.code
               Hash Cond: (ds.id = (dsf.dase_id)::bigint)
               ->  Foreign Scan on public.data_sets_fdw ds  (cost=100.00..148.40 rows=1280 width=40)
                     Output: ds.id, ds.code, ds.location
                     Remote SQL: SELECT id, code FROM public.data_sets
               ->  Hash  (cost=170.48..170.48 rows=10 width=16)
                     Output: dsf.size_in_bytes, dsf.dase_id
                     ->  Foreign Scan on public.data_set_files_fdw dsf  (cost=100.00..170.48 rows=10 width=16)
                           Output: dsf.size_in_bytes, dsf.dase_id
                           Filter: (dsf.parent_id IS NULL)
                           Remote SQL: SELECT dase_id, parent_id, size_in_bytes FROM public.data_set_files
(17 rows)

And also with ANALYZE:

db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as "raw_size",pg_size_pretty(sum(dsf.size_in_bytes)) as "size" from data_set_files_fdw dsf, data_sets_fdw ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc;
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=327.81..327.97 rows=64 width=40) (actual time=219401.864..219414.641 rows=13839 loops=1)
   Sort Key: (sum(dsf.size_in_bytes))
   Sort Method: quicksort  Memory: 2283kB
   ->  HashAggregate  (cost=325.09..325.89 rows=64 width=40) (actual time=219327.664..219363.709 rows=13839 loops=1)
         ->  Hash Join  (cost=270.61..324.45 rows=64 width=40) (actual time=219127.848..219277.308 rows=13839 loops=1)
               Hash Cond: (ds.id = (dsf.dase_id)::bigint)
               ->  Foreign Scan on data_sets_fdw ds  (cost=100.00..148.40 rows=1280 width=40) (actual time=1.057..77.415 rows=13839 loops=1)
               ->  Hash  (cost=170.48..170.48 rows=10 width=16) (actual time=219126.713..219126.713 rows=13839 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 649kB
                     ->  Foreign Scan on data_set_files_fdw dsf  (cost=100.00..170.48 rows=10 width=16) (actual time=1082.614..219083.326 rows=13839 loops=1)
                           Filter: (parent_id IS NULL)
                           Rows Removed by Filter: 35726596
 Total runtime: 219438.925 ms
(13 rows)

Regards
Manuel

-- 
Kohler Manuel (ID SIS)
Research Informatics, Scientific IT Services (ID ETHZ)
Quantitative Genomics Facility (QGF), D-BSSE
ETH Zurich, Mattenstr. 26 (1078 1.02), CH-4058 Basel, +41 61 387 3132

On 04 Apr 2014, at 17:01, Albe Laurenz <laurenz.albe@xxxxxxxxxx> wrote:

> Kohler Manuel wrote:
>> I have a question regarding the use of indices when querying foreign data tables (using postgres_fdw
>> of 9.3 to another postgres DB).
>> Everything works fine beside the fact that the indices which are defined in the foreign DB are not
>> used at all when I do query it through the foreign data wrapper. This leads to an extremely slow query
>> performance.
>> 
>> Here is what I did:
>> 
>> CREATE EXTENSION postgres_fdw;
>> 
>> CREATE SERVER app_db
>>  FOREIGN DATA WRAPPER postgres_fdw
>>  OPTIONS (dbname ‘<DBNAME>', host 'localhost');
>> 
>> CREATE USER MAPPING for openbis
>>  SERVER app_db
>>  OPTIONS (user ‘<USRE>', password ‘<PW>');
>> 
>> CREATE FOREIGN TABLE data_sets_fdw
>>  (
>>    id bigint,
>>    code code,
>>    location file_path
>>  )
>>  SERVER app_db OPTIONS (table_name 'data_sets’);
>> 
>> The data_sets_fdw corresponds to the table in the other DB which is defined as this:
>> 
>> db=> \d data_sets
>>                           Table "public.data_sets"
>>  Column  |   Type    |                       Modifiers
>> ----------+-----------+--------------------------------------------------------
>> id       | bigint    | not null default nextval('data_sets_id_seq'::regclass)
>> code     | code      | not null
>> location | file_path | not null
>> Indexes:
>>    "data_sets_pkey" PRIMARY KEY, btree (id)
>>    "data_sets_code_key" UNIQUE CONSTRAINT, btree (code)
>>    "data_sets_code_idx" btree (code)
>> 
>> When I  realised that the queries are so slow I ran an EXPLAIN ANALYZE which shows that the indices
>> are not used. I can also post the query plan if it helps.
>> 
>> So am I doing something wrong here, or is this not possible that the indices are used by the fdw?
> 
> The indices should be used.
> 
> Did you ANALYZE the remote table on the remote database?
> What is the remote query (EXPLAIN VERBOSE)?
> What do you get for EXPLAIN ANALYZE of the remote query when executed on the remote database?
> 
> Yours,
> Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general