* Igor Chudov (ichudov@xxxxxxxxx) wrote: > Right now I have a personal (one user) project to create a 5-10 > Terabyte data warehouse. The largest table will consume the most space > and will take, perhaps, 200,000,000 rows. I run data-warehouse databases on that order (current largest single instance is ~4TB running under 9.0.4). If the largest table is only 200M rows, PG should handle that quite well. Our data is partitioned by month and each month is about 200M records and simple queries can run in 15-20 minutes (with a single thread), with complex windowing queries (split up and run in parallel) finishing in a couple of hours. > However, while an hour is fine, two weeks per query is NOT fine. What's really, really, really useful are two things: EXPLAIN, and this mailing list. :) Seriously, run EXPLAIN on your queries before you run them and see if how the query is going to be executed makes sense. Here's a real easy hint: if it says "External Sort" and has big numbers, come talk to us here- that's about one of the worst things you can possibly do. Of course, PG's going to avoid doing that, but you may have written a query (unintentionally) which forces PG to do a sort, or something else. > I have a server with about 18 TB of storage and 48 GB of RAM, and 12 > CPU cores. If you partition up your data and don't mind things running in different transactions, you can definitely get a speed boost with PG by running things in parallel. PG will handle that very well, in fact, if two queries are running against the same table, PG will actually combine them and only actually read the data from disk once. > I cannot shell out $47,000 per CPU for Oracle for this project. The above data warehouse was migrated from an Oracle-based system. :) > To be more specific, the batch queries that I would do, I hope, > would either use small JOINS of a small dataset to a large dataset, or > just SELECTS from one big table. Make sure that you set your 'work_mem' correctly- PG will use that to figure out if it can hash the small table (you want that to happen, trust me..). If you do end up having sorts, it'll also use the work_mem value to figure out how much memory to use for sorting. > So... Can Postgres support a 5-10 TB database with the use pattern > stated above? Yes, certainly. Thanks, Stephen
Attachment:
signature.asc
Description: Digital signature