Fair point. For now, I mean "just fast" - which is 5-15 seconds, but I'd like to get it down to the 1-2 second range.
From the query I provided, I have approximately 30,000 unique keys (what I called primary_id) that I'm grouping by, and each key has a series of numerical values for each of the type_ids. I'm looking at averages, stddev and other statistics across a few hundred type_ids (where agg.type_id in ....). The part of the query that varies is the user specified type_ids, which makes it impossible to precalculate my statistics.
I'd like this to eventually scale to a million unique keys, and a thousand type_ids.
For now Postgres been great for modeling the data, understanding where I hit performance bottle necks, and providing a fast enough user interface. But, I'm definitely starting to think about whether I can cache my data (with millions of keys and thousands of type_ids, the data might be too large), and whether to look into distributed databases (even thought I can't precompute the stats, my queries are easily distributable across multiple processors since each processor could take a batch of keys). I might even want to consider a column oriented database - since my keys don't change often, I could potentially add new columns when there are new type_ids.
I've been thinking of looking into memcached or hbase. If you have any suggestions on which options I should explore, I'd greatly appreciate it.
Sorry, for veering off topic a bit from postgres.
thanks,
Anish
On Wed, Aug 17, 2011 at 10:32 PM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote:
On 18/08/2011 9:03 AM, Anish Kejariwal wrote:Do you really mean "realtime"? Or just "fast"?
Thanks for the help Pavel and Craig. I really appreciate it. I'm going to try a couple of these different options (write a c function, use a sql function with case statements, and use plperl), so I can see which gives me the realtime performance that I need, and works best for clean code in my particular case.
If you have strongly bounded latency requirements, any SQL-based, disk-based system is probably not for you. Especially not one that relies on a statics-based query planner, caching, and periodic checkpoints. I'd be looking into in-memory databases designed for realtime environments where latency is critical.
Hard realtime: If this system fails to respond within <x> milliseconds, all the time, every time, then something will go "smash" or "boom" expensively and unrecoverably.
Soft realtime: If this system responds late, the late response is expensive or less useful. Frequent late responses are unacceptable but the occasional one might be endurable.
Just needs to be fast: If it responds late, the user gets irritated because they're sitting and waiting for a response. Regular long stalls are unacceptable, but otherwise the user can put up with it. You're more concerned with average latency than maximum latency.
--
Craig Ringer