Job, * Job (Job@xxxxxxxxxxxxxxxxxxxx) wrote: > we are planning to store historically data into a Postgresql 9.6 table. The question is less about what you're storing in PG and more about what you're going to be doing with that data. > We see on Postgresql limit that it could handle "big data". > In fact, limit for a single table is 32 Tb. Very few people would advocate storing a single, non-partitioned, 32TB table in any database system. As with most other database technologies, you'll want to partition up your data. The general rule of thumb is to partition at the 100s-of-millions of tuples level which tends to lead to tables which are somewhere between 5G and 100G. At 100G per table, you would end up with a few hundred tables to get up to 32TB, which is quite managable in PG. Indeed, PG can support many more tables than that, but putting more than a few hundred into a single inheritance structure using constraint exclusion for the partitioning can lead to long planning times. > We need to archive this data to generate report and analysis views. Is the historical data changing? Are the queries changing? Will you primairly be performing queries which look at all of the data? > Anyone has experienced Postgresql 9.6 with Database and table bigger than four or five Terabytes? Yes. > Which hardware requirements has been used? This depends a great deal on what you're actually doing. > There were some problems or bottleneck with so big data? If you're looking to run queries against 4 or 5 TBs at a time where all of the tuples have to be considered (no index-based lookups, etc), then you'll want to be doing parallel work. With 9.6, you could try using the parallel query capability, but even with 9.5 and earlier you could pretty trivially write your own parallel query system by using multiple connections to the database and it works very well. The only downside to using multiple queries is if you have ongoing changes happening and you need a consistent view of the data- that's still possible to do with multiple processes, but you have to do some prep work ahead of time to extract out the keys for all of the records you want to process in a given run and store them in a static side-table that the parallel processes then use. You have to avoid removing records that are included in that set, of course, and depending on your requirments you might wish to avoid updating them also. Thanks! Stephen
Attachment:
signature.asc
Description: Digital signature