I have one system which I have used partitioning. For this particular
case I have tons of data over about (50 years). What I did is wrote
small loader that breaks data in tables based on date, so I have tables
like abc_2000, abc_2001 etc. The loading script is only a couple
hundred lines of code. The only part that was a little bit of work was
to allow for easy access to the data for the data for the devs. I did
this by writing a few PL functions to automatically union the tables
and produce results. So the function like getData(startData,enddate)
would run a union query for the respective date ranges.
Benjamin
P.S. If I am doing anything that can be improved for the data access
portion please let me know and feel free to voice your opinions. I am
always looking for new ways to make this particular database faster.
Gerald Timothy G Quimpo wrote:
On Thu, 2007-03-29 at 21:30 -0700, Benjamin Arai wrote:
Rebuilding an index can't be the PostgreSQL solution for all
cases. I am dealing with databases in the hundreds of gigs
range and I am adding about 10gigs of data a week. At
some point its going to take longer than a week to rebuild
all of the indexes in the database.
On the other hand, if I am to partition the data into
several tables then it might not be such a big deal since
I am only adding and never deleting... This makes it a
little more of a pain in the ass.
I am leaning toward a policy of always partitioning large
tables. I haven't found the time to do it properly yet,
thinking about it, hoping that someone who'se done it will
chime in with their rules of thumb.
Like Benjamin, I have a database that is close to 600GB
for 2.25 years of data, and if I were to add the other
4 years of data that we have archived away, will easily go
into the terabyte range. There are a few individual tables
which approach 100GB all by themselves.
As it is, I can't afford to do reindex or even backup
(pg_dump or any other method) or other administrative tasks
on those tables since the processes take too long (there are
workarounds, i could backup single tables at slack times,
which would allow me to do a complete backup (but not
self-consistent as a set) over the course of a week or so.
So I'm leaning toward partitioning, perhaps selecting
partition rules so that no table will be larger than
around 5GB, at which point, reindex or admin procedures
that take exclusive locks now become only minor
inconveniences rather than showstoppers.
How do people take consistent backups of very large
databases on Linux/FreeBSD? I'm aware of PITR, but
might not be able to set aside a box with enough
drives for it. LVM Snapshot? performance issues with
LVM, etc?
tiger