On Fri, Mar 13, 2015 at 4:03 PM, Varadharajan Mukundan <srinathsmn@xxxxxxxxx> wrote: >> We might even consider taking experts advice on how to tune queries and >> server, but if postgres is going to behave like this, I am not sure we would >> be able to continue with it. >> >> Having said that, I would day again that I am completely new to this >> territory, so I might miss lots and lots of thing. > > My two cents: Postgres out of the box might not be a good choice for > data warehouse style queries, that is because it is optimized to run > thousands of small queries (OLTP style processing) and not one big > monolithic query. I've faced similar problems myself before and here > are few tricks i followed to get my elephant do real time adhoc > analysis on a table with ~45 columns and few billion rows in it. > > 1. Partition your table! use constraint exclusion to the fullest extent > 2. Fire multiple small queries distributed over partitions and > aggregate them at the application layer. This is needed because, you > might to exploit all your cores to the fullest extent (Assuming that > you've enough memory for effective FS cache). If your dataset goes > beyond the capability of a single system, try something like Stado > (GridSQL) > 3. Storing index on a RAM / faster disk disk (using tablespaces) and > using it properly makes the system blazing fast. CAUTION: This > requires some other infrastructure setup for backup and recovery > 4. If you're accessing a small set of columns in a big table and if > you feel compressing the data helps a lot, give this FDW a try - > https://github.com/citusdata/cstore_fdw Agreed here. IF you're gonna run reporting queries against postgresql you have to optimize for fast seq scan stuff. I.e. an IO subsystem that can read a big table in hundreds of megabytes per second. Gigabytes if you can get it. A lot of spinning drives on a fast RAID card or good software raid can do this on the cheapish, since a lot of times you don't need big drives if you have a lot. 24 cheap 1TB drives that each can read at ~100 MB/s can gang up on the data and you can read a 100GB in a few seconds. But you can't deny physics. If you need to read a 2TB table it's going to take time. If you're only running 1 or 2 queries at a time, you can crank up the work_mem to something crazy like 1GB even on an 8GB machine. Stopping sorts from spilling to disk, or at least giving queries a big playground to work in can make a huge difference. If you're gonna give big work_mem then definitely limit connections to a handful. If you need a lot of persistent connections then use a pooler. The single biggest mistake people make in setting up reporting servers on postgresql is thinking that the same hardware that worked well for transactional stuff (a handful of SSDs and lots of memory) might not help when you're working with TB data sets. The hardware you need isn't the same, and using that for a reporting server is gonna result in sub-optimal performance. -- To understand recursion, one must first understand recursion. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance