Hi all,
I have a table
defined this way ( a couple of indexes are actually obsolete, the table needs a
little clean ) :
# \d
categories_stat
Table "public.categories_stat"
Column | Type | Modifiers
--------------+-----------------------------+-----------------------------------------------------------------------------
id | numeric(1000,1) | not null default function_get_next_sequence('categories_stat_id_seq'::text)
giorno | timestamp without time zone |
categoria | character varying(50) |
accessi | integer |
bytes2 | integer |
time_elapsed | character varying(10) |
utenti | character varying(50) |
action | integer |
Indexes:
"categories_stat_pkey" PRIMARY KEY, btree (id)
"action_stat_ginidx" gin (to_tsvector('english'::regconfig, action::text))
"action_stat_idx" btree (action)
"categoria_stat_ginidx" gin (to_tsvector('english'::regconfig, categoria::text))
"categoria_stat_idx" btree (categoria)
"categories_stat_giorno_date_idx" brin ((giorno::date))
"categories_stat_giorno_time_idx" brin ((giorno::time without time zone))
"categories_stat_utenti_like_idx" btree (utenti text_pattern_ops)
"giorno_idx" btree (giorno) CLUSTER
"keycategories_stat" btree (id)
"utenti_stat_idx" btree (utenti)
Table "public.categories_stat"
Column | Type | Modifiers
--------------+-----------------------------+-----------------------------------------------------------------------------
id | numeric(1000,1) | not null default function_get_next_sequence('categories_stat_id_seq'::text)
giorno | timestamp without time zone |
categoria | character varying(50) |
accessi | integer |
bytes2 | integer |
time_elapsed | character varying(10) |
utenti | character varying(50) |
action | integer |
Indexes:
"categories_stat_pkey" PRIMARY KEY, btree (id)
"action_stat_ginidx" gin (to_tsvector('english'::regconfig, action::text))
"action_stat_idx" btree (action)
"categoria_stat_ginidx" gin (to_tsvector('english'::regconfig, categoria::text))
"categoria_stat_idx" btree (categoria)
"categories_stat_giorno_date_idx" brin ((giorno::date))
"categories_stat_giorno_time_idx" brin ((giorno::time without time zone))
"categories_stat_utenti_like_idx" btree (utenti text_pattern_ops)
"giorno_idx" btree (giorno) CLUSTER
"keycategories_stat" btree (id)
"utenti_stat_idx" btree (utenti)
The table has about 100M records, and is growing,
we are thinking about partition it by day ( the column "giorno"
).
To speed up some queries, we defined a materialized
view
# \d+
categories_stat_materialized_view
Materialized view "public.categories_stat_materialized_view"
Column | Type | Modifiers | Storage | Stats target | Description
-----------+-----------------------+-----------+----------+--------------+-------------
id | numeric | | main | |
giorno | date | | plain | |
categoria | character varying(50) | | extended | |
accessi | bigint | | plain | |
utenti | character varying(50) | | extended | |
action | integer | | plain | |
Indexes:
"categories_stat_materialized_view_id_idx" UNIQUE, btree (id)
"categories_stat_materialized_view_aggregate_1" btree (utenti, giorno, action)
View definition:
SELECT max(categories_stat.id) AS id,
categories_stat.giorno::date AS giorno,
categories_stat.categoria,
sum(categories_stat.accessi) AS accessi,
categories_stat.utenti,
categories_stat.action
FROM categories_stat
GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action;
Materialized view "public.categories_stat_materialized_view"
Column | Type | Modifiers | Storage | Stats target | Description
-----------+-----------------------+-----------+----------+--------------+-------------
id | numeric | | main | |
giorno | date | | plain | |
categoria | character varying(50) | | extended | |
accessi | bigint | | plain | |
utenti | character varying(50) | | extended | |
action | integer | | plain | |
Indexes:
"categories_stat_materialized_view_id_idx" UNIQUE, btree (id)
"categories_stat_materialized_view_aggregate_1" btree (utenti, giorno, action)
View definition:
SELECT max(categories_stat.id) AS id,
categories_stat.giorno::date AS giorno,
categories_stat.categoria,
sum(categories_stat.accessi) AS accessi,
categories_stat.utenti,
categories_stat.action
FROM categories_stat
GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action;
The "id" column was previously defined
as "nextval( somesequence )" and was about 0.1% faster, then we kept max(categories_stat.id) due to internal
standards.
# explain analyze
SELECT max(categories_stat.id) AS id,
categories_stat.giorno::date AS giorno,
categories_stat.categoria,
sum(categories_stat.accessi) AS accessi,
categories_stat.utenti,
categories_stat.action
FROM categories_stat
GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action;
categories_stat.giorno::date AS giorno,
categories_stat.categoria,
sum(categories_stat.accessi) AS accessi,
categories_stat.utenti,
categories_stat.action
FROM categories_stat
GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=22436069.43..23036104.47 rows=7273152 width=65) (actual time=186699.351..189719.626 rows=1529896 loops=1)
Group Key: ((giorno)::date), categoria, utenti, action
-> Sort (cost=22436069.43..22508800.95 rows=29092608 width=65) (actual time=186699.335..187540.305 rows=4987051 loops=1)
Sort Key: ((giorno)::date), categoria, utenti, action
Sort Method: external merge Disk: 287784kB
-> Gather (cost=8094034.95..11499463.60 rows=29092608 width=65) (actual time=131126.010..164090.150 rows=4987051 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=8093034.95..8589202.80 rows=7273152 width=65) (actual time=137793.619..161235.482 rows=997410 loops=5)
Group Key: ((giorno)::date), categoria, utenti, action
-> Sort (cost=8093034.95..8150928.30 rows=23157340 width=38) (actual time=137793.605..149953.194 rows=18522425 loops=5)
Sort Key: ((giorno)::date), categoria, utenti, action
Sort Method: external merge Disk: 946016kB
-> Parallel Seq Scan on categories_stat (cost=0.00..1370601.75 rows=23157340 width=38) (actual time=0.044..13328.278 rows=18522425 loops=5)
Planning time: 1.323 ms
Execution time: 189956.108 ms
(16 rows)
The key point is that
the periodic refresh of the MV ( REFRESH materialized view concurrently
categories_stat_materialized_view ) takes about 8 minutes ( it takes about
3 to create the MV ).
I'd like to know if there is a way to
speed up the refresh ( or the creation ) .... with some additional indexes,
maybe ?
Thanks in
advance