Search Postgresql Archives

Problems pushing down WHERE-clause to underlying view

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi


We have a system with 2 layers of views. It is about forestry.

The first layer contains the logic like grouping volumes in logs together to stems or harvesting areas and joining species names to codes and things like that.

The second layer just joins this underlying views to a table with user ids and machine ids. So, when used by the application there is a where clause containing the user id which gives access to the correct data

in the underlying view based on machine id.


The underlying view in this case can return approx 22000 rows, grouped from approx 8 million logs by harvest date, harvest object and so on.


The problem is that it always calculates all those 22000 rows even if the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.


I have tried this on 2 servers and my laptop, PostgreSQL 9.6 and 10 and get the same issue.


I have tried to pick the query apart to understand what is happening.


First, the underlying view looks like this except I removed some fields that doesn't affect the case to save some space:

CREATE OR REPLACE VIEW underlying_view AS
 SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    s.species_group_name,
    p.product_group_name ,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM
    (
       SELECT
        hl.contractor_id,
        hl.machine_key,
        hl.operator_key,
        hl.object_key,
        hl.sub_object_key,
        date(hl.harvest_date) AS harvest_date, --this is timestamptz since we use the time in other places
        hl.species_group_key,
        hl.product_key,
        sum(hl.m3_sub) AS m3_sub,
        count(*) AS number_of_logs
    FROM version_union_tables_r02.harvester_logs hl
    GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), hl.species_group_key, hl.product_key
    ) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON l.machine_key::text = mi.machine_key::text      LEFT JOIN version_union_tables_r02.objects o ON l.machine_key::text = o.machine_key::text AND l.object_key = o.object_key AND l.sub_object_key = o.sub_object_key      LEFT JOIN version_union_tables_r02.products p ON l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key      LEFT JOIN version_union_tables_r02.species s ON l.machine_key::text = s.machine_key::text AND l.species_group_key = s.species_group_key;


In the next level is a view that looks like this:


CREATE OR REPLACE VIEW top_level_view AS
 SELECT *
   FROM underlying_view a
     JOIN contractor_access b ON a.machine_key = b.machine_key WHERE b.active <> 0;


If I query this top_level_view like:

SELECT * FROM top_level_view WHERE user_id = 'name@email.address';

I haven't succeeded to avoid the underlying view to compute the full dataset.


The user_id 'name@email.address' returns 1 machine_key from contractor_access table that has any hits in the underlying view (4 in total but 3 machines are not represented in the underlying view)

We call the machine that we get from contractor_id table 'machine1'

Here is what I have tried to find out when the plan changes:


1) SELECT * FROM underlying_view WHERE machine_key = 'machine1'; -- returns 250 rows in approx 100 ms, so, works as expected


2) select * from (select 'machine1' machine_key) a, underlying_view b where a.machine_key = b.machine_key; --same as above, works as expected


3) select * from (select * from contractor_access where user_id = 'name@email.address') a, underlying_view b where a.machine_key = b.machine_key;      -- Here I am hit. this returns the same 250 rows, but in over 4 seconds


/*So I thought I should try to force down the machine_key to the underlying view with lateral like this*/


4) select * from (select * from contractor_access where user_id = 'name@email.address') a,
lateral (select * from underlying_view where machine_key = a.machine_key) b;

But this doesn't work either. It returns the same 250 rows in approx 4 seconds.


My question is, is there some trick to force the planner to push down the machine_key.

I cannot understand what is fooling the planner.

The table is analyzed, I have tried on several machines, so I do not think it is miss leading statistics.


I haven't done any configuration more than tried with max_parallel_workers_per_gather to 0 since the workers makes it harder to understand what is happening.


Here is the quer plan on query number 3 above:

EXPLAIN select * from (select * from contractor_access where user_id = 'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;

Aggregate  (cost=543839.03..543839.04 rows=1 width=8)
  ->  Hash Join  (cost=395402.74..543798.72 rows=16123 width=0)
        Hash Cond: ((hl.machine_key)::text = contractor_access.machine_key)
        ->  Hash Left Join  (cost=395395.10..533563.59 rows=806147 width=400)               Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))               ->  Hash Left Join  (cost=395380.73..485122.31 rows=806147 width=32)                     Hash Cond: (((hl.machine_key)::text = (p.machine_key)::text) AND (hl.product_key = p.product_key))                     ->  Hash Left Join (cost=395320.48..444697.18 rows=806147 width=36)                           Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))                           ->  Hash Left Join (cost=395257.01..417426.05 rows=806147 width=36)                                 Hash Cond: ((hl.machine_key)::text = (mi.machine_key)::text)                                 Join Filter: (((hl.contractor_id)::text = (mi.contractor_id)::text) OR ((hl.contractor_id IS NULL) AND (mi.contractor_id IS NULL)))                                 ->  HashAggregate (cost=395254.66..403316.13 rows=806147 width=86)                                       Group Key: hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, hl.harvest_date, hl.species_group_key, hl.product_key                                       ->  Seq Scan on harvester_logs hl  (cost=0.00..234025.22 rows=8061472 width=54)                                 ->  Hash  (cost=1.60..1.60 rows=60 width=35)                                       ->  Seq Scan on machine_info mi  (cost=0.00..1.60 rows=60 width=35)
                          ->  Hash  (cost=33.26..33.26 rows=1726 width=23)
                                ->  Seq Scan on objects o (cost=0.00..33.26 rows=1726 width=23)
                    ->  Hash  (cost=37.90..37.90 rows=1490 width=29)
                          ->  Seq Scan on products p (cost=0.00..37.90 rows=1490 width=29)
              ->  Hash  (cost=7.55..7.55 rows=455 width=31)
                    ->  Seq Scan on species s  (cost=0.00..7.55 rows=455 width=31)
        ->  Hash  (cost=7.59..7.59 rows=4 width=21)
              ->  Seq Scan on contractor_access  (cost=0.00..7.59 rows=4 width=21)                     Filter: (t4e_contractor_id = 'name@email.address'::text)



Thanks


Nicklas Avén












[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux