Search Postgresql Archives

Re: How to speed up pg_trgm / gin index scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 22, 2015 at 6:51 AM, Christian Ramseyer <rc@xxxxxxxxxxx> wrote:
>
> DM=# \d+ logs_01
>
>     Column    |            Type             |
> --------------+-----------------------------+
>  host         | character varying(255)      |
>  facility     | character varying(10)       |
>  priority     | character varying(10)       |
>  tag          | character varying(255)      |
>  log_date     | timestamp without time zone |
>  program      | character varying(255)      |
>  msg          | text                        |
>  seq          | bigint                      |
>
> Indexes:
>     "logs_01_pkey" PRIMARY KEY, btree (seq)
>     "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
>     "logs_01_date_index" btree (log_date)
>     "tridx_logs_01_msg" gin (msg gin_trgm_ops)
>
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is around 60 GB on the 35 GB table:
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> DM=# \dt+ logs_01
>                      List of relations
>  Schema |  Name   | Type  |  Owner   | Size  | Description
> --------+---------+-------+----------+-------+-------------
>  public | logs_01 | table | postgres | 35 GB |
>
> DM=# \di+ tridx_logs_01_msg
>                                List of relations
>  Schema |       Name        | Type  |  Owner   |  Table  | Size  | Description
> --------+-------------------+-------+----------+---------+-------+-------------
>  public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |
>
>

What version of postgres is this? GIN indexes improved a lot in 9.4,
they use less than half the space and have doubled the speed (on
average).

Now, whatever version you have; a GIN index has two data structures,
the main one in which the index entries are stored as key-value pairs
(please someone correct my description of the situation) and a pending
list, which is a temporary unsorted list of pending entries in which
all the newly inserted tuples arrive until a VACUUM (or until the
pending list grows upto work_mem) moves that list into the main
structure.

That happens to avoid the penalty of inserting new rows in the main
structure which could be expensive.
But while the pending list grows the speed of the index decreases. And
because you have work_mem in 16Gb your pending list is possibly
growing without control.

if you have 9.3 or superior you can know how big is that pending list
installing pgstattuple.

CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatginindex('tridx_logs_01_msg');

NOTE: remember that pending_pages is expressed in 8kb-pages

if that is the problem or if you are in <= 9.2 then try VACUUM the table

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux