On Friday 26 October 2007 8:56 am, tfinneid@xxxxxxxxxxxxxxxxxxxxx wrote: > > Serious engineering does not imply perfect engineering, I have analyzed it > and made my tradeoffs. What you are forgetting here is that you clearly > dont understand the enire solution, So I will try to explain it again. And > if you still think its bonkers, the I urge you to come up with a solution > that works with the requirements. > > Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives, > which must be stored by descrete time groups, e.g. second 3,6,9,12. The > data that arrives is approx 4MB per second, so in this case its 12MB. This > has to be processed by the server and written to the db, within 1 second. > There can be up to 5 writers at the same time. Within that same second, at > least 16 readers should be able to read all the data, *each*. Writers and > readers are only concerned with the latest data, i.e. data from the latest > time group, e.g. second 9. > This has to go on every predefined seconds for the next 6-12 weeks, > without stop, pause or any errors. These are the requirements. > > When I performed performance tests I found several unwanted effects from > several test scenarios. Here are the most important ones: > > - single large table, with indexes created when table is created. > - this leads to the performance of an insert degrading as more data is > added, when I get > to 1 billion rows it took 50 seconds to add the data. > > My lesson from this is that > - single inserts can never be efficient enough > - indexes cause linear performance drop as data volume increases > > So I tried a different approach, which would address both issues: > > - separate tables for each bulk of data > - use of bulk insert through jdbc COPY. > - add indexes to the newly create table after the copy is finished. > > My lesson from this is: > - insert take constant time, no matter how much data is in the base > - adding the indexes after insert takes constant time, i.e. some > milliseconds. > > From this I realised that using either single tables or partitions is the > way to go, since I only need to access the latest data, i.e. the newest > table, in normal situations. > > After thinking about it and discussing with this group, I found that using > partitions would be more practical for two reasons: > - changes to the parent table is automatically propagated to all child > tables, so the schema remains consistent and the server wont brake because > of differences in the tables. > - it is more maintainable to use "create with inheritance" sql in source > code than the entire ddl of the table. > > So now I have tested the server 24/7 for a week and a half, with 1 writer > and 16 readers writing all the mentioned data, and everything works fine. > Expect for the select on the parent table, which now runs out of memory. > Which in it self is not a problem since I will never use the parent table > in production in any case. > > regards > > tom I might be missing the point, but couldn't you do a Copy to a single table instead of multiple inserts and avoid the index overhead. -- Adrian Klaver aklaver@xxxxxxxxxxx ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly