Search Postgresql Archives

Re: COPY command details

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have one system which I have used partitioning. For this particular case I have tons of data over about (50 years). What I did is wrote small loader that breaks data in tables based on date, so I have tables like abc_2000, abc_2001 etc. The loading script is only a couple hundred lines of code. The only part that was a little bit of work was to allow for easy access to the data for the data for the devs. I did this by writing a few PL functions to automatically union the tables and produce results. So the function like getData(startData,enddate) would run a union query for the respective date ranges.

Benjamin

P.S. If I am doing anything that can be improved for the data access portion please let me know and feel free to voice your opinions. I am always looking for new ways to make this particular database faster.

Gerald Timothy G Quimpo wrote:
On Thu, 2007-03-29 at 21:30 -0700, Benjamin Arai wrote:
Rebuilding an index can't be the PostgreSQL solution for all cases. I am dealing with databases in the hundreds of gigs range and I am adding about 10gigs of data a week. At some point its going to take longer than a week to rebuild all of the indexes in the database.

On the other hand, if I am to partition the data into several tables then it might not be such a big deal since I am only adding and never deleting... This makes it a little more of a pain in the ass.

I am leaning toward a policy of always partitioning large
tables. I haven't found the time to do it properly yet, thinking about it, hoping that someone who'se done it will
chime in with their rules of thumb.

Like Benjamin, I have a database that is close to 600GB for 2.25 years of data, and if I were to add the other
4 years of data that we have archived away, will easily go
into the terabyte range.  There are a few individual tables
which approach 100GB all by themselves.

As it is, I can't afford to do reindex or even backup
(pg_dump or any other method) or other administrative tasks
on those tables since the processes take too long (there are
workarounds, i could backup single tables at slack times,
which would allow me to do a complete backup (but not
self-consistent as a set) over the course of a week or so.

So I'm leaning toward partitioning, perhaps selecting partition rules so that no table will be larger than
around 5GB, at which point, reindex or admin procedures
that take exclusive locks now become only minor inconveniences rather than showstoppers.

How do people take consistent backups of very large databases on Linux/FreeBSD? I'm aware of PITR, but
might not be able to set aside a box with enough
drives for it.  LVM Snapshot? performance issues with
LVM, etc?

tiger



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux