> -----Original Message----- > From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general- > owner@xxxxxxxxxxxxxx] On Behalf Of Artur > Sent: Monday, June 15, 2009 5:00 AM > To: pgsql-general@xxxxxxxxxxxxxx > Subject: 10 TB database > > Hi! > > We are thinking to create some stocks related search engine. > It is experimental project just for fun. > > The problem is that we expect to have more than 250 GB of data every > month. > This data would be in two tables. About 50.000.000 new rows every > month. > > We want to have access to all the date mostly for generating user > requesting reports (aggregating). > We would have about 10TB of data in three years. > > Do you think is it possible to build this with postgresql and have any > idea how to start? :) Consider summarization of this data into a data warehouse. Most of the data will be historical and therefore the vast majority of the data will be read-mostly (with the rare write operations probably consisting mostly of corrections). You won't want to scan the whole 10TB every time you make a summarization query. I have an idea that might make an interesting experiment: Create tables that are a combination of year and month. Create views that combine all 12 months into one yearly table. Create a view that combines all the yearly views into one global view. The reason that I think this suggestion may have some merit is that the historical trends will not need to be recalculated on a daily basis (but it would be nice if you could perform calculations against the whole pile at will on rare occasions). By maintaining separate tables by month, it will reduce the average depth of the b-trees. I guess that for the most part, the active calculations will be only against recent data (e.g. the past 6 months to one year or so). It could also be interesting to create a view that combines the N most recent months of data, where N is supplied on the fly (I do not know how difficult it would be to create this view or even if it is possible). If you are going to collect a terrific volume of data like this, I suggest that a mathematics package might be coupled with the data like SAS, R, Octave, SciLab, Maxima, etc. so that you can support decisions derived from the data effectively. You are also going to need high-end hardware to support a database like this. Just some ideas you might like to test when you start fooling around with this data. IMO-YMMV -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general