Re: COPY command details

Benjamin Arai <benjamin@xxxxxxxxxxxx> · Thu, 29 Mar 2007 22:15:20 -0700

I have one system which I have used partitioning.  For this particular 
case I have tons of data over about (50 years).  What I did is wrote 
small loader that breaks data in tables based on date, so I have tables 
like abc_2000, abc_2001 etc.  The loading script is only a couple 
hundred lines of code.  The only part that was a little bit of work was 
to allow for easy access to the data for the data for the devs.  I did 
this by writing a few PL functions to automatically union  the tables 
and produce results. So the function like getData(startData,enddate) 
would run a union query for the respective date ranges.

Benjamin

P.S. If I am doing anything that can be improved for the data access 
portion please let me know and feel free to voice your opinions.  I am 
always looking for new ways to make this particular database faster.

Gerald Timothy G Quimpo wrote:
On Thu, 2007-03-29 at 21:30 -0700, Benjamin Arai wrote:

Rebuilding an index can't be the PostgreSQL solution for all 
cases. I am dealing with databases in the hundreds of gigs 
range and I am adding about 10gigs of data a week.  At 
some point its going to take longer than a week to rebuild 
all of the indexes in the database.

On the other hand, if I am to partition the data into 
several tables then it might not be such a big deal since 
I am only adding and never deleting... This makes it a 
little more of a pain in the ass.

I am leaning toward a policy of always partitioning large
tables.  I haven't found the time to do it properly yet, 
thinking about it, hoping that someone who'se done it will
chime in with their rules of thumb.

Like Benjamin, I have a database that is close to 600GB 
for 2.25 years of data, and if I were to add the other
4 years of data that we have archived away, will easily go
into the terabyte range.  There are a few individual tables
which approach 100GB all by themselves.

As it is, I can't afford to do reindex or even backup
(pg_dump or any other method) or other administrative tasks
on those tables since the processes take too long (there are
workarounds, i could backup single tables at slack times,
which would allow me to do a complete backup (but not
self-consistent as a set) over the course of a week or so.

So I'm leaning toward partitioning, perhaps selecting 
partition rules so that no table will be larger than
around 5GB, at which point, reindex or admin procedures
that take exclusive locks now become only minor 
inconveniences rather than showstoppers.

How do people take consistent backups of very large 
databases on Linux/FreeBSD?  I'm aware of PITR, but
might not be able to set aside a box with enough
drives for it.  LVM Snapshot? performance issues with
LVM, etc?

tiger