Search Postgresql Archives

Re: advice for efresh of materialized view

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Ivan Fabris, Gruppo ColliniConsulting.it
Sent: Wednesday, April 12, 2017 6:15 AM
To: 'pgsql-general@xxxxxxxxxxxxxx' <pgsql-general@xxxxxxxxxxxxxx>
Subject: advice for efresh of materialized view

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Hi all,

I have a table defined this way ( a couple of indexes are actually obsolete, the table needs a little clean ) :

 

 

# \d categories_stat
                                              Table "public.categories_stat"
    Column    |            Type             |                                  Modifiers
--------------+-----------------------------+-----------------------------------------------------------------------------
 id           | numeric(1000,1)             | not null default function_get_next_sequence('categories_stat_id_seq'::text)
 giorno       | timestamp without time zone |
 categoria    | character varying(50)       |
 accessi      | integer                     |
 bytes2       | integer                     |
 time_elapsed | character varying(10)       |
 utenti       | character varying(50)       |
 action       | integer                     |
Indexes:
    "categories_stat_pkey" PRIMARY KEY, btree (id)
    "action_stat_ginidx" gin (to_tsvector('english'::regconfig, action::text))
    "action_stat_idx" btree (action)
    "categoria_stat_ginidx" gin (to_tsvector('english'::regconfig, categoria::text))
    "categoria_stat_idx" btree (categoria)
    "categories_stat_giorno_date_idx" brin ((giorno::date))
    "categories_stat_giorno_time_idx" brin ((giorno::time without time zone))
    "categories_stat_utenti_like_idx" btree (utenti text_pattern_ops)
    "giorno_idx" btree (giorno) CLUSTER
    "keycategories_stat" btree (id)
    "utenti_stat_idx" btree (utenti)

 

 

The table has about 100M records, and is growing, we are thinking about partition it by day ( the column "giorno" ).

To speed up some queries, we defined a materialized view

 

 

# \d+ categories_stat_materialized_view
             Materialized view "public.categories_stat_materialized_view"
  Column   |         Type          | Modifiers | Storage  | Stats target | Description
-----------+-----------------------+-----------+----------+--------------+-------------
 id        | numeric               |           | main     |              |
 giorno    | date                  |           | plain    |              |
 categoria | character varying(50) |           | extended |              |
 accessi   | bigint                |           | plain    |              |
 utenti    | character varying(50) |           | extended |              |
 action    | integer               |           | plain    |              |
Indexes:
    "categories_stat_materialized_view_id_idx" UNIQUE, btree (id)
    "categories_stat_materialized_view_aggregate_1" btree (utenti, giorno, action)
View definition:
 SELECT max(categories_stat.id) AS id,
    categories_stat.giorno::date AS giorno,
    categories_stat.categoria,
    sum(categories_stat.accessi) AS accessi,
    categories_stat.utenti,
    categories_stat.action
   FROM categories_stat
  GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action;

 

 

The "id" column was previously defined as "nextval( somesequence )" and was about 0.1% faster, then we kept max(categories_stat.id) due to internal standards.

 

 

# explain analyze SELECT max(categories_stat.id) AS id,
    categories_stat.giorno::date AS giorno,
    categories_stat.categoria,
    sum(categories_stat.accessi) AS accessi,
    categories_stat.utenti,
    categories_stat.action
   FROM categories_stat
  GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action;

 


                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=22436069.43..23036104.47 rows=7273152 width=65) (actual time=186699.351..189719.626 rows=1529896 loops=1)
   Group Key: ((giorno)::date), categoria, utenti, action
   ->  Sort  (cost=22436069.43..22508800.95 rows=29092608 width=65) (actual time=186699.335..187540.305 rows=4987051 loops=1)
         Sort Key: ((giorno)::date), categoria, utenti, action
         Sort Method: external merge  Disk: 287784kB
         ->  Gather  (cost=8094034.95..11499463.60 rows=29092608 width=65) (actual time=131126.010..164090.150 rows=4987051 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Partial GroupAggregate  (cost=8093034.95..8589202.80 rows=7273152 width=65) (actual time=137793.619..161235.482 rows=997410 loops=5)
                     Group Key: ((giorno)::date), categoria, utenti, action
                     ->  Sort  (cost=8093034.95..8150928.30 rows=23157340 width=38) (actual time=137793.605..149953.194 rows=18522425 loops=5)
                           Sort Key: ((giorno)::date), categoria, utenti, action
                           Sort Method: external merge  Disk: 946016kB
                           ->  Parallel Seq Scan on categories_stat  (cost=0.00..1370601.75 rows=23157340 width=38) (actual time=0.044..13328.278 rows=18522425 loops=5)
 Planning time: 1.323 ms
 Execution time: 189956.108 ms
(16 rows)

 

 

The key point is that the periodic refresh of the MV ( REFRESH materialized view concurrently categories_stat_materialized_view )  takes about 8 minutes ( it takes about 3 to create the MV ).

I'd like to know if there is a way to speed up the refresh ( or the creation ) .... with some additional indexes, maybe ?

Thanks in advance

 

 ____________________________________________________________________________________________________

 

Also, in regards to:

 

         Sort Method: external merge  Disk: 287784kB

 

I’d increase work_mem setting, to avoid on-disk sorting.

 

Regards,

Igor Neyman

 

 

 

 


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux