roopa perumalraja wrote:
Hi all,
As I am inserting 100million rows daily into partitioned tables
(daily wise), it is getting slower.
What is - the inserts? By how much? What tables? What indexes? How
are you inserting these rows?
I take my words back as 100million rows. The insert of 20million rows
everyday takes only 10minutes as I use copy statement to copy into
temperory table from flat files then do some manipulation to the data
& insert it into the paritioned tables. I have solved the problem. I
have paritioned the tables date-wise. (Is partitioning the tables
monthly is recommanded?)
Depends upon your usage - some people partition daily if they have a lot
of incoming data.
The proble is with another insert, selecting data from one of the
paritioned tables and doing some calculations then inserting into
another table. That is around 280000 rows every day. This takes
really a long time (almost a day) if somebody is doing something with
the database. If nobody is using the database then it takes almost
two hours. Even a select statement to other tables in the datbase
affects this insert. While inserting I use
BEGIN WORK SELECT foo1 LOCK TABLE foo2 IN EXCLUSIVE MODE INSERT INTO
foo2 (SELECT......) COMMIT WORK
All the tables are indexed. I am using 4 indexes including the pkey
index. Will dropping the index before inserting and reindexing it
after the insert will help?
Maybe. It is often quicker to recreate an index if you are updating lots
of rows.
Even the retrivel of data, select statement on those tables takes
about 30 mintues.
All selects take 30 minutes, regardless what work they do? Or do
you have specific selects that are causing problems?
Any statement more than one running on database takes time. Say for
example if I am inserting 20million rows into one table & at the same
time if I try updating into another table, that takes a lot of time.
You haven't given details of the select statements, or what the
bottleneck is in your system. It sounds like it might be disk I/O. Have
you tried monitoring activity with vmstat/iostat (assuming you're on a
Linux/Unix system)?
I have tried increasing the parameters in postgres.conf but still
that doesn't help me much as the no of rows are huge.
What parameters, to what values? What hardware are you running on?
What load do you place on it?
parameters in postgres.conf which I increased are
fsysn on
> wal_buffers 128
> checkpoint_segments 256 (2 GB)
checkpoint_timeout 3600 (1 hour)
How did you calculate these? Is this based on your bulk data load.
> work_mem: set to 128MB
maintenance_work_mem: to 512MB
I assume you have enough memory to support these settings? Something
over 4GB, yes?
I wanted to increase shared_buffer to 60,000 but I am not able to
restart the database if I change it even to 10000. It says 'could not
start postmaster'. (max_connection is 100.)
"It" says? What do the logs say?
Will the idea of storing the database in cluster (memory)
increase the performance of insert/update/select in the table in
a suitation like this? Thanks a lot in advance.
A better question would be "Why isn't my data being cached?". The
operating-system should cache regularly used files.
So I think the problem is when two things are happening in database,
it takes a long time. Thats the reason I was wondering if storing the
database in cluster will solve the problem.
The data should be cached anyway. That's the point.
You still haven't told us any of the information needed to help you.
Some of the questions we still need answers for:
1. What queries are giving you trouble, and how many rows do they deal with?
2. What hardware is the database running on?
3. What load is on the system (try "vmstat 10" or similar)
--
Richard Huxton
Archonet Ltd