Search Postgresql Archives

Re: Problems pushing down WHERE-clause to underlying view

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/15/19 9:27 AM, Nicklas Avén wrote:

On 2/15/19 5:06 PM, Adrian Klaver wrote:
 > On 2/15/19 7:28 AM, Nicklas Avén wrote:
 >> Hi
 >>
>> The problem is that it always calculates all those 22000 rows even if the user id I use only gives 250 rows.
 >>
 >> So, the query uses 4 seconds instead of under 100 ms.
 >
 > https://www.postgresql.org/docs/10/sql-createview.html
 >
> "CREATE VIEW defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced in a query."

 >



Sorry, I must have expressed what I mean bad. Of course a view is not materialized.
I will explain without views what I mean here below



 > Might want to look at materialized view:
 > https://www.postgresql.org/docs/10/sql-creatematerializedview.html
 >
> "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that it also remembers the query used to initialize the view, so that it can be refreshed later upon demand. A materialized view has many of the same properties as a table, but there is no support for temporary materialized views or automatic generation of OIDs."

 >


No, materialized views is not an option. We get some data into those tables daily. Recalculating the full dataset on a lot of views like this doesn't make sense. Instead we have tables maintained with processed new data. But I want this last part of logic on top as views for flexibility,
to not need cached tables for each possible type of grouping that we need.
Started out with materialized views and it didn't work out well.




 >
> I would also suggest running the EXPLAIN below with ANALYZE so actual timings are returned. Also try:
 >
 > SELECT
 >     *
 > FROM
 >     underlying_view AS b
 > JOIN
 >     contractor_access AS b
 > ON
 >     a.machine_key = b.machine_key
 > WHERE
 >     user_id = 'name@email.address'
 >>

Sorry again, I didn't mention. This I have tried this since this is what the top level view do. So first step when trying to understand this was (of course) to apply the where-clause directly to the query



So, let's do that also on the underlying query (view) .


Here I have 2 queries, where I apply the where clause directly to the query in the underlying view

(joining the contractor_access table directly on that query).

The first takes 30-40 ms ms and returns the same 250 rows as the second.
In the first I use the machine_key in the where clause.
In the second query that takes about 16 seconds to return the same 250 rows I use the user_id in the contractor_access table.

I have also cleaned up the contractor_access table. So there is only 1 row now, with my email as user_id and the same machine_key as used in the first query.


I have not had chance to fully go through all of below. Some questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion, when aliasing could you include AS. It would make finding what l.* refers to easier for those of us with old eyes:)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?



Query 1:


EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM ( SELECT hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
            FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), hl.species_group_key, hl.product_key) l      LEFT JOIN version_union_tables_r02.machine_info mi ON l.machine_key::text = mi.machine_key::text      LEFT JOIN version_union_tables_r02.objects o ON l.machine_key::text = o.machine_key::text AND l.object_key = o.object_key AND l.sub_object_key = o.sub_object_key      LEFT JOIN version_union_tables_r02.products p ON l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key      LEFT JOIN version_union_tables_r02.species s ON l.machine_key::text = s.machine_key::text AND l.species_group_key = s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where l.machine_key = '887655635442600'
;

which results in this query plan

Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual time=25.804..27.134 rows=250 loops=1)   ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1 width=19) (actual time=0.009..0.010 rows=1 loops=1)
         Filter: (machine_key = '887655635442600'::text)
  ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122) (actual time=25.793..26.959 rows=250 loops=1)         Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))         ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624 width=120) (actual time=25.755..26.763 rows=250 loops=1)               Hash Cond: (((hl.machine_key)::text = (p.machine_key)::text) AND (hl.product_key = p.product_key))               ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624 width=118) (actual time=25.706..26.543 rows=250 loops=1)                     Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))                     ->  Hash Left Join  (cost=61799.78..62619.90 rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)                           Hash Cond: ((hl.machine_key)::text = (mi.machine_key)::text)                           ->  HashAggregate (cost=61796.99..62079.79 rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)                                 Group Key: hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, date(hl.harvest_date), hl.species_group_key, hl.product_key                                 ->  Bitmap Heap Scan on harvester_logs hl  (cost=570.14..61224.14 rows=22914 width=61) (actual time=0.909..11.573 rows=24151 loops=1)                                       Recheck Cond: ((machine_key)::text = '887655635442600'::text)
                                       Heap Blocks: exact=538
                                      ->  Bitmap Index Scan on version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)                                             Index Cond: ((machine_key)::text = '887655635442600'::text)                           ->  Hash  (cost=2.77..2.77 rows=1 width=38) (actual time=0.023..0.023 rows=1 loops=1)                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB                                 ->  Seq Scan on machine_info mi (cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)                                       Filter: ((machine_key)::text = '887655635442600'::text)
                                       Rows Removed by Filter: 61
                    ->  Hash  (cost=16.12..16.12 rows=4 width=84) (actual time=0.025..0.026 rows=3 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Bitmap Heap Scan on objects o (cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 loops=1)                                 Recheck Cond: ((machine_key)::text = '887655635442600'::text)
                                 Heap Blocks: exact=1
                                ->  Bitmap Index Scan on version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4 width=0) (actual time=0.015..0.015 rows=3 loops=1)                                       Index Cond: ((machine_key)::text = '887655635442600'::text)               ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual time=0.037..0.037 rows=26 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                    ->  Bitmap Heap Scan on products p (cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26 loops=1)                           Recheck Cond: ((machine_key)::text = '887655635442600'::text)
                           Heap Blocks: exact=1
                          ->  Bitmap Index Scan on version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26 width=0) (actual time=0.009..0.009 rows=26 loops=1)                                 Index Cond: ((machine_key)::text = '887655635442600'::text)         ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual time=0.028..0.029 rows=12 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Bitmap Heap Scan on species s (cost=4.37..10.52 rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)                     Recheck Cond: ((machine_key)::text = '887655635442600'::text)
                     Heap Blocks: exact=1
                    ->  Bitmap Index Scan on version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12 width=0) (actual time=0.008..0.008 rows=12 loops=1)                           Index Cond: ((machine_key)::text = '887655635442600'::text)
Planning time: 0.434 ms
Execution time: 27.370 ms



Next query, the slow one that calculates the whole dataset:


EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM ( SELECT hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
            FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), hl.species_group_key, hl.product_key) l      LEFT JOIN version_union_tables_r02.machine_info mi ON l.machine_key::text = mi.machine_key::text      LEFT JOIN version_union_tables_r02.objects o ON l.machine_key::text = o.machine_key::text AND l.object_key = o.object_key AND l.sub_object_key = o.sub_object_key      LEFT JOIN version_union_tables_r02.products p ON l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key      LEFT JOIN version_union_tables_r02.species s ON l.machine_key::text = s.machine_key::text AND l.species_group_key = s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where t4e_contractor_id = 'nicklas.aven@xxxxxxxxxxxxx';


results in this query plan:

Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122) (actual time=15860.900..15888.766 rows=250 loops=1)   Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))   ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69) (actual time=15859.604..15887.287 rows=250 loops=1)
         Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
         Join Filter: (hl.species_group_key = s.species_group_key)
         Rows Removed by Join Filter: 2750
        ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044 width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
               Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
               Join Filter: (hl.product_key = p.product_key)
               Rows Removed by Join Filter: 6250
              ->  Merge Left Join  (cost=1779788.20..2022471.20 rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)                     Merge Cond: ((hl.machine_key)::text = (mi.machine_key)::text)                     ->  Merge Join  (cost=1779783.74..2022437.81 rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)                           Merge Cond: ((hl.machine_key)::text = ci.machine_key)                           ->  GroupAggregate (cost=1779782.72..2012287.44 rows=808712 width=69) (actual time=15088.353..15878.172 rows=2683 loops=1)                                 Group Key: hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), hl.species_group_key, hl.product_key                                 ->  Sort (cost=1779782.72..1800000.52 rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552 loops=1)                                       Sort Key: hl.machine_key, hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), hl.species_group_key, hl.product_key                                       Sort Method: external merge Disk: 543456kB                                       ->  Seq Scan on harvester_logs hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual time=0.007..3169.984 rows=8084464 loops=1)                           ->  Sort  (cost=1.02..1.03 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=1)
                                 Sort Key: ci.machine_key
                                 Sort Method: quicksort  Memory: 25kB
                                ->  Seq Scan on contractor_access ci (cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)                                       Filter: (t4e_contractor_id = 'nicklas.aven@xxxxxxxxxxxxx'::text)                     ->  Sort  (cost=4.47..4.62 rows=62 width=38) (actual time=0.102..0.173 rows=266 loops=1)
                           Sort Key: mi.machine_key
                           Sort Method: quicksort  Memory: 30kB
                          ->  Seq Scan on machine_info mi (cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62 loops=1)               ->  Sort  (cost=127.50..131.23 rows=1491 width=35) (actual time=1.205..3.071 rows=7204 loops=1)
                     Sort Key: p.machine_key
                     Sort Method: quicksort  Memory: 175kB
                    ->  Seq Scan on products p  (cost=0.00..48.91 rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)         ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual time=0.385..1.233 rows=3259 loops=1)
               Sort Key: s.machine_key
               Sort Method: quicksort  Memory: 65kB
              ->  Seq Scan on species s  (cost=0.00..10.60 rows=460 width=37) (actual time=0.004..0.146 rows=460 loops=1)   ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual time=1.286..1.287 rows=1690 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 190kB
        ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725 width=84) (actual time=0.004..0.600 rows=1725 loops=1)
Planning time: 0.527 ms
Execution time: 15945.641 ms




Thanks

Nicklas



--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux