Re: select count() out of memory

Adrian Klaver <aklaver@xxxxxxxxxxx> · Sat, 27 Oct 2007 14:34:07 -0700



On Friday 26 October 2007 8:56 am, tfinneid@xxxxxxxxxxxxxxxxxxxxx wrote:
>
> Serious engineering does not imply perfect engineering, I have analyzed it
> and made my tradeoffs. What you are forgetting here is that you clearly
> dont understand the enire solution, So I will try to explain it again. And
> if you still think its bonkers, the I urge you to come up with a solution
> that works with the requirements.
>
> Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives,
> which must be stored by descrete time groups, e.g. second 3,6,9,12. The
> data that arrives is approx 4MB per second, so in this case its 12MB. This
> has to be processed by the server and written to the db, within 1 second.
> There can be up to 5 writers at the same time. Within that same second, at
> least 16 readers should be able to read all the data, *each*. Writers and
> readers are only concerned with the latest data, i.e. data from the latest
> time group, e.g. second 9.
> This has to go on every predefined seconds for the next 6-12 weeks,
> without stop, pause or any errors. These are the requirements.
>
> When I performed performance tests I found several unwanted effects from
> several test scenarios. Here are the most important ones:
>
> - single large table, with indexes created when table is created.
>    - this leads to the performance of an insert degrading as more data is
> added, when I get
>      to 1 billion rows it took 50 seconds to add the data.
>
>      My lesson from this is that
>       - single inserts can never be efficient enough
>       - indexes cause linear performance drop as data volume increases
>
> So I tried a different approach, which would address both issues:
>
> - separate tables for each bulk of data
>    - use of bulk insert through jdbc COPY.
>    - add indexes to the newly create table after the copy is finished.
>
>    My lesson from this is:
>    - insert take constant time, no matter how much data is in the base
>    - adding the indexes after insert takes constant time, i.e. some
> milliseconds.
>
> From this I realised that using either single tables or partitions is the
> way to go, since I only need to access the latest data, i.e. the newest
> table, in normal situations.
>
> After thinking about it and discussing with this group, I found that using
> partitions would be more practical for two reasons:
> - changes to the parent table is automatically propagated to all child
> tables, so the schema remains consistent and the server wont brake because
> of differences in the tables.
> - it is more maintainable to use "create with inheritance" sql in source
> code than the entire ddl of the table.
>
> So now I have tested the server 24/7 for a week and a half, with 1 writer
> and 16 readers writing all the mentioned data, and everything works fine.
> Expect for the select on the parent table, which now runs out of memory.
> Which in it self is not a problem since I will never use the parent table
> in production in any case.
>
> regards
>
> tom
I might be missing the point, but couldn't you do a Copy to a single table 
instead of multiple inserts and avoid the index overhead.
-- 
Adrian Klaver
aklaver@xxxxxxxxxxx

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly