Re: Partial indexes instead of partitions

Leonardo F <m_lists@xxxxxxxx> · Fri, 11 Jun 2010 13:15:26 +0000 (GMT)

> Well the situation is still ambiguous 
> so:
> Is it possible to provide this table and indexes definitions?
> And it 
> would be great it you describe the queries you are going to do
> on this table 
> or just provide the SQL.

Sure!
Basically what I'm trying to do is to partition the index in the table 
where the data is going to be inserted into smaller indexes, but
without using partitions: I would use partial indexes.
"Historic" data will have just the big index... 

say that I want to store 1G rows: 100M per day, 10 days.
I would have 10 tables, 9 of them with 2 big indexes (the indexes
on the 2 columns that are going to be used in queries together
with the timestamp) and the latest one with 24*2 smaller indexes
(so that insertion will still be fast) to be dropped overnight after
the 2 big indexes have been created... then a new table is created
(for the new day's data) with the small indexes and the oldest table
dropped (as I said, I won't store more than 10 days).

This is "pseudo SQL":

CREATE TABLE master
(
   ts timestamp,
   key1 bigint,  <-- populated with almost-random values
   key2 bigint,  <-- populated with almost-random values
   data1 varchar(20),
   [...]
);

CREATE TABLE master_01 ( 
CHECK ( ts >= DATE '2006-03-01' AND ts < DATE '2006-03-02' )
) INHERITS (master);

CREATE INDEX master_01_ix1 ON master_01 (key1);
CREATE INDEX master_01_ix2 ON master_01 (key2)

CREATE TABLE master_02 ( 
CHECK ( ts >= DATE '2006-03-02' AND ts < DATE '2006-03-03' )
) INHERITS (master);

CREATE INDEX master_02_ix1 ON master_02 (key1);
CREATE INDEX master_02_ix2 ON master_02 (key2)

[10 tables like the above...] 

With this config insertion on the "today's" table will be slow
at the end of the day, because updating the 2 indexes will be
very slow (they will be getting very large).

So I thought I could make, on "today's table", instead of the 2
indexes on the whole table, something like:

CREATE INDEX master_10_1_ix1 ON  master_10 (key1)
WHERE (ts >= DATETIME '2006-03-10 00:00'  and
ts < DATETIME '2006-03-10 01:00')

(same thing for second indexed column)

CREATE INDEX master_10_2_ix1 ON  master_10 (key1)
WHERE (ts >= DATETIME '2006-03-10 01:00'  and
ts < DATETIME '2006-03-10 02:00')

(same thing for second indexed column)

[other 22 indexes definition like the above, one per hour...]

That is, the table where data will be inserted (ts will always be
ascending, so I will always insert data in the latest table)
will have multiple small indexes.
Then, at night, the small indexes would be dropped after one big
index has been created (since no more rows will be inserted in that
table, I don't care if the index is big).

So, a query like:

select * from master where key1=938479 
and ts between now() and "now()-10 minutes"

would use the proper index on the "today's" table;

a query like:

select * from master where key1=938479 
and ts between "3 days ago" and "2 days ago"

would use the indexes in table "today minus 2 days" and
"today minus 3 days"

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general