1. In my case, I'm erring on the side of not using the limited partitioning support in PG 8.3, which we're using .... because I'm generating new tables all the time, I need to dynamically generate the DML anyway, and it's actually less code to just do my own calculation on the application side to see which table name I need to use (as opposed to dynamically creating all the constraints and triggers needed to get PG to do it), and doing almost everything in vanilla SQL makes it much easier to port (I'm going to need to support Oracle within the next 12 months). I have the luxury of not having any application code which directly hits the database, it all goes through one persistence manager class which encapsulates both CRUD operations and bulk stats queries, so I don't have to fake the existence of the old table with views or such.
2. The idea of partitioning by "b", the performance counter type column, is an interesting one .... I am definitely going to consider it for a future release. For now, my new schema is going to end up turning one 300GB table (incl indexes) into about 100 tables of sizes ranging from about 0.1GB to 3.5GB each (with indexes), which feels like an OK size range for both manageability and performance (though I'd still be interested to see what people on this list think). When I get to storing multiple terabytes, having 6,000 tables is going to look more attractive :-)
3. If you were suggesting the (ts, a) index as an alternative to (a, b, ts) and (ts) that's an interesting and cunning plan ... I'd need to see how it performs on the queries which extract data series for graphing, which are of the form " .... where a=2617 and b=4 and ts between '2010-02-22 00:00' and '2010-02-25 00:00'" and which are interactive.
<testing>
I tried this and it took quite a bit longer than the version with the "natural" index (102ms vs 0.5ms both from buffer cache) ... and this is a test data set with only 1,000 pieces of equipment (values of a) and my design target is 25,000 pieces.
# explain analyse select time_stamp, value from foo where a=1 and b=14 and time_stamp between '2010-02-23 09:00' and '2010-02-23 12:00';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using foo1 on foo (cost=0.00..21196.67 rows=17 width=16) (actual time=0.133..102.571 rows=72 loops=1)
Index Cond: ((time_stamp >= '2010-02-23 09:00:00'::timestamp without time zone) AND (time_stamp <= '2010-02-23 12:00:00'::timestamp without time zone) AND (a = 1) AND (b = 14))
Total runtime: 102.720 ms
(3 rows)
Time: 103.844 ms
# explain analyse select time_stamp, value from perf_raw_2010_02_23 where a=1 and b=14 and time_stamp between '2010-02-23 09:00' and '2010-02-23 12:00';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on perf_raw_2010_02_23 (cost=5.37..68.55 rows=16 width=16) (actual time=0.107..0.319 rows=72 loops=1)
Recheck Cond: ((a = 1) AND (b = 14) AND (time_stamp >= '2010-02-23 09:00:00'::timestamp without time zone) AND (time_stamp <= '2010-02-23 12:00:00'::timestamp without time zone))
-> Bitmap Index Scan on perf_raw_2010_02_23_a (cost=0.00..5.36 rows=16 width=0) (actual time=0.082..0.082 rows=72 loops=1)
Index Cond: ((a = 1) AND (b = 14) AND (time_stamp >= '2010-02-23 09:00:00'::timestamp without time zone) AND (time_stamp <= '2010-02-23 12:00:00'::timestamp without time zone))
Total runtime: 0.456 ms
(5 rows)
Time: 1.811 ms
"foo" is just a copy of "perf_raw_2010_02_23" with the different indexing that Joe suggested.
I don't know how sophisticated the index scan in PG 8.3 is, i.e. whether it will do a sparse scan skipping btree nodes that can be constraint-excluded using the 2nd and subsequent fields of the index tuple. By the looks of the performance, it looks as if it is just doing a simple scan over all the index "leaf node" entries in the range and testing them individually.
Tom - is an optimization for this in / in plan for a future release?
For my purposes, I only need the ts-first index for doing data rollups, which are done at the end of each rollup period ... once I have started a new data table, I don't need that index anymore and I can drop it from all the older tables ... right now I am using 1 table per day for raw data, but keeping 30 days history, so I can drop the secondary index from 29 out of 30 which makes the disk space cost of it modest, and the insertion overhead seems to be covered.
Looks like the a-first index is definitely necessary for acceptable interactive performance ... it's powering a Flex UI that graphs multiple data series concurrently.
4. Apart from query performance, the big benefit I wanted from the sharding scheme is to get out of the DELETE and VACUUM business .... now all my data aging is table drops, which are much, much faster in PG :-)
On Mon, Feb 22, 2010 at 11:23 PM, Joe Conway <mail@xxxxxxxxxxxxx> wrote:
Without going through your very long set of questions in detail, it
strikes me that you might be better off if you:
1) use PostgreSQL partitioning (constraint exclusion)
2) partition by ts range
3) consider also including b in your partitioning scheme
4) create one index as (ts, a)
5) use dynamically generated SQL and table names in the application
code to create (conditionally) and load the tables
But of course test both this and your proposed method and compare ;-)
Also you might consider PL/R for some of your analysis (e.g. mode would
be simple, but perhaps not as fast):
http://www.joeconway.com/web/guest/pl/r
HTH,
Joe