On Wednesday, April 16, 2014, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Tue, Apr 15, 2014 at 6:36 PM, Nick Eubank <nickeubank@xxxxxxxxx> wrote:
Hi all,A few years ago someone said postgres windows can't set working_mem above about 2 GB (www.postgresql.org/message-id/17895.1315869622@xxxxxxxxxxxxx -- seems to be same for maintenance_working_mem ). Im finding limit still present.I'm doing single user, single connection data intensive queries and would like to set a higher value on windows to better use 16gb built in ram (don't control platform, so can't jump to Linux).Anyone found a work around?Before worrying much about that, I'd just give it a try at the highest value it will let you set and see what happens.If you want to do something like hashed aggregate that would have been predicted to fit in 6GB but not in 1.999GB, then you will lose out on the hash agg by not being able to set the memory higher. On the other hand, if your queries want to use sorts that will spill to disk anyway, the exact value of work_mem usually doesn't matter much as long as it not absurdly small (1MB absurdly small for analytics, 64MB is probably not). In fact very large work_mem can be worse in those cases, because large priority queue heaps are unfriendly to the CPU cache. (Based on Linux experience, but I don't see why that would not carry over to Windows)Frankly I think you've bitten off more than you can chew. 600GB of csv is going to expand to probably 3TB of postgresql data once loaded. If you can't control the platform, I'm guessing your disk array options are no better than your OS options are.ACID compliance is expensive, both in storage overhead and in processing time, and I don't think you can afford that and probably don't need it. Any chance you could give up on databases and get what you need just using pipelines of sort, cut, uniq, awk, perl, etc. (or whatever their Window equivalent is)?Cheers,Jeff
Thanks Jeff -- you're clearly correct that SQL is not the optimal tool for this, as I'm clearly leaning. I just can't find anything MADE for one-user big data transformations. :/ I may resort to that kind of pipeline approach, I just have so many transformations to do I was hoping I could use a declarative language in something.
But your point about hash map size is excellent. No idea how big an index for this would be...