Search Postgresql Archives

GIN Trigram Index Size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Some weeks ago I got some advice here on how to build a trigram index.
Now, I have implemented this in our rather large database, which looks
like this:

We have one table of log events per month, used as partitions via "logs":

 public | logs                     | table    | postgres
 public | logs_01                  | table    | postgres
 public | logs_02                  | table    | postgres
 public | logs_03                  | table    | postgres
 public | logs_04                  | table    | postgres
 public | logs_05                  | table    | postgres
 public | logs_06                  | table    | postgres
 public | logs_07                  | table    | postgres
 public | logs_08                  | table    | postgres
 public | logs_09                  | table    | postgres
 public | logs_10                  | table    | postgres
 public | logs_11                  | table    | postgres
 public | logs_12                  | table    | postgres

and the individual tables have this definition:

                                     Table "public.logs_09"
    Column    |            Type             |
Modifiers
--------------+-----------------------------+----------------------------------------------------
 host         | character varying(255)      |
 facility     | character varying(10)       |
 priority     | character varying(10)       |
 tag          | character varying(255)      |
 log_date     | timestamp without time zone |
 log_month    | integer                     |
 program      | character varying(255)      |
 msg          | text                        |
 seq          | bigint                      | not null default
nextval('logs_seq_seq'::regclass)
 msg_filtered | text                        |
Indexes:
    "logs_09_pkey" PRIMARY KEY, btree (seq)
    "idx_logs_09_lower_host" btree (lower(host::text) varchar_pattern_ops)
    "idx_logs_09_trg_msg" gin (msg gin_trgm_ops)
    "logs_09_date_index" btree (log_date)
Check constraints:
    "logs_09_log_month_check" CHECK (log_month = 9)

It works fine (thanks!) but we have some disk space issues.

Now this tables are huge, about 70 to 100 Mio rows each. Every day, we
insert around 2 mio new rows and also delete 2 mio old rows.

Now for a month that is not active, i.e. I created the index on msg and
no data was written or deleted since then, the index on msg is around 15 GB:

<N (N.oid = C.relnamespace)^J   WHERE nspname IN ('public')^J   AND
relkind = 'i' and relname like '%trg_msg%'^J   ORDER BY 1;

         indexname          | pg_size_pretty
----------------------------+----------------
 public.idx_logs_01_trg_msg | 14 GB
 public.idx_logs_02_trg_msg | 13 GB
 public.idx_logs_03_trg_msg | 17 GB
 public.idx_logs_04_trg_msg | 16 GB
 public.idx_logs_05_trg_msg | 15 GB



But in the index that is currently active, we observe a growth of around
~5 GB per day, which only seems to be stopped by doing a REINDEX. E.g.
here, I did a reindex three days ago, and idx_logs_09_trg_msg has
already grown back from 15 to 36 GB:

 public.idx_logs_09_trg_msg | 36 GB

I have read some discussions about pending list bloat issues, but there
it was suggested that vacuuming the table should reclaim the space, and
this does not seem to the case. The only way I found to reduce the size
is by doing a REINDEX.

The string in msg can be quite large, but average to around 200 characters:

select avg(length(msg)) from (select msg from logs_09 where log_date
between '2015-09-01' and '2015-09-08') x;

avg
----------------------
199.3491688585874446

Any insights on what might be happening here? Can I somehow check if the
space is used up in sparsely filled pages or this list structures? Maybe
sizes like this are just normal given our usage pattern? But it does
seem like quite an extreme growth to me.

Thanks
Christian







-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux