Search Postgresql Archives

Re: How Big is Too Big for Tables?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 28, 2010 at 12:03 PM, Joshua D. Drake <jd@xxxxxxxxxxxxxxxxx> wrote:
> On Wed, 2010-07-28 at 11:09 -0600, Bill Thoen wrote:
>> I'm building a national database of agricultural information and one
>> of the layers is a bit more than a gigabyte per state. That's 1-2
>> million records per state, with a mult polygon geometry, and i've got
>> about 40 states worth of data. I trying to store everything in a
>> single PG table. What I'm concerned about is if I combine every state
>> into one big table then will performance will be terrible, even with
>> indexes? On the other hand, if I store the data in several smaller
>> files, then if a user zooms in on a multi-state region,  I've got  to
>> build or find a much more complicated way to query multiple files.
>>
>> So I'm wondering, should I be concerned with building a single
>> national size table (possibly 80-100 Gb) for all these records, or
>> should I keep the files smaller and hope there's something like
>> ogrtindex out there for PG tables? what do you all recommend in this
>> case?
>
> 80-100Gb isn't that much. However it may be worth looking into
> partitioning by state.
>

See http://archives.postgresql.org/pgsql-general/2010-07/msg00691.php
for details, but here is a summary.

My experience has not been the greatest. I have been trying to figure
out if I can store a few hundred million rows, and have experienced a
great number of problems.

One. Loading the data is a problem. COPY is the quickest way (I was
able to achieve a max of about 20,000 inserts per second). However,
you need to make sure there are no indexes, not even a primary key, in
order to extract maximum speed. That means, you have to load
*everything* in one go. If you load in stages, you have drop all the
indexes, then load, then rebuild the indexes. Next time you want to
load more data, you to repeat this process. Building the indexes takes
a long time, so experimenting is a chore.

Two. Partitioning is not the perfect solution. My database will
ultimately have about 13 million rows per day (it is daily data) for
about 25 years. So, I need either --

- One big table with 25 * 365 * 13 million rows. Completely undoable.
- 25 yearly tables with 365 * 13 million rows each. Still a huge
chore, very slow queries.
- 25 * 365 tables with 13 million rows each. More doable, but
partitioning doesn't work.

Three. At least, in my case, the overhead is too much. My data are
single bytes, but the smallest data type in Pg is smallint (2 bytes).
That, plus the per row overhead adds to a fair amount of overhead.

I haven't yet given up on storing this specific dataset in Pg, but am
reconsidering. It is all readonly data, so flat files might be better
for me.

In other words, Pg is great, but do tests, benchmark, research before
committing to a strategy. Of course, since you are storing geometries,
Pg is a natural choice for you. My data are not geometries, so I can
explore alternatives for it, while keeping my geographic data in Pg.

Hope this helps, or, at least provides an alternative view point.


> Sincerely,
>
> Joshua D. Drake
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux