On Tue, 8 May 2007, ismo.tuononen@xxxxxxxxxxx wrote:
On Tue, 8 May 2007, Pomarede Nicolas wrote:
As you can see, with hundreds of thousands events a day, this table will need
being vaccumed regularly to avoid taking too much space (data and index).
Note that processing rows is quite fast in fact, so at any time a count(*) on
this table rarely exceeds 10-20 rows.
For the indexes, a good way to bring them to a size corresponding to the
actual count(*) is to run 'reindex'.
why you have index in table where is only 10-20 rows?
are those indexes to prevent some duplicate rows?
I need these indexes to sort rows to process in chronological order. I'm
also using an index on 'oid' to delete a row after it was processed (I
could use a unique sequence too, but I think it would be the same).
Also, I sometime have peaks that insert lots of data in a short time, so
an index on the event's date is useful.
And as the number of effective row compared to the number of dead rows is
only 1%, doing a count(*) for example takes many seconds, even if the
result of count(*) is 10 row (because pg will sequential scan all the data
pages of the table). Without index on the date, I would need sequential
scan to fetch row to process, and this would be slower due to the high
number of dead rows.
I have some tables just to store unprosessed data, and because there is
only few rows and I always process all rows there is no need for
indexes. there is just column named id, and when I insert row I take
nextval('id_seq') :
insert into some_tmp_table(id,'message',...) values (nextval('id_seq'),'do
something',...);
I know that deleting is slower than with indexes, but it's still fast
enough, because all rows are in memory.
and that id-column is just for delete, it's unique and i can always delete
using only it.
Ismo
Nicolas