On Tuesday 26 January 2010 01:39:48 nair rajiv wrote: > On Tue, Jan 26, 2010 at 1:01 AM, Craig James <craig_james@xxxxxxxxxxxxxx>wrote: > I am working on a project that will take out structured content > from wikipedia > and put it in our database. Before putting the data into the database I > wrote a script to > find out the number of rows every table would be having after the data is > in and I found > there is a table which will approximately have 50,000,000 rows after data > harvesting. > Is it advisable to keep so much data in one table ? Depends on your access patterns. I.e. how many rows are you accessing at the same time - do those have some common locality and such. > I have read about 'partitioning' a table. An other idea I have is > to break the table into > different tables after the no of rows in a table has reached a certain > limit say 10,00,000. > For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' > each having 10,00,000 rows. > I needed advice on whether I should go for partitioning or the approach I > have thought of. Your approach is pretty close to partitioning - except that partitioning makes that mostly invisible to the outside so it is imho preferrable. > We have a HP server with 32GB ram,16 processors. The storage has > 24TB diskspace (1TB/HD). > We have put them on RAID-5. It will be great if we could know the > parameters that can be changed in the > postgres configuration file so that the database makes maximum utilization > of the server we have. > For eg parameters that would increase the speed of inserts and selects. Not using RAID-5 possibly would be a good start - many people (me included) experienced bad write performance on it. It depends a great deal on the controller/implementation though. RAID-10 is normally to be considered more advantageous despite its lower usable space ratio. Did you create one big RAID-5 out of all disks? Thats not a good idea, because its pretty likely that another disk fails while you restore a previously failed disk. Unfortunately in that configuration that means you have lost your complete data (in the most common implementations at least). Andres PS: Your lines are strangely wrapped... -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance