Re: Large data and slow queries

Samuel Williams <space.ship.traveller@xxxxxxxxx> · Fri, 28 Apr 2017 10:47:58 +1200

Citus is also now just an extension.

Sounds pretty awesome, I'll certainly consider your system if/when we
decide to make changes.

On 22 April 2017 at 08:41, Andrew Staller <andrew@xxxxxxxxxxxxx> wrote:
> Samuel,
>
> Short answer to your questions:  (1) TimescaleDB and CitusDB are focusing on
> solving different problems, and (2) TimescaleDB is an Apache 2-licensed
> extension to run in your Postgres database, not a fork or different system.
>
> Longer answer to your first question:
>
> From what we've read and the conversations we've had with Citus (awesome
> folks, btw), they're primarily solving a different problem -- OLAP queries
> and distributed transactions -- while we are focused on time-series data
> analysis.  As such, we haven't benchmarked against Citus and if we were to,
> it would require some unrealistic finagling in order to make it an
> apples-to-apples comparison.
>
> As an example, their partitioning is based on a single primary key, while
> all writes in their clustered version go to a cluster master (although their
> hosted version of Citus MX changes this a bit).  We perform two-dimensional
> partitioning using your primary key and time, with the partitions
> automatically created/closed on time intervals and data volume in order to
> keep the most recent partitioned chunks of data (and their B-trees) in
> memory to support fast ingest for time-series data. (***) We have only
> released our single-node version, but the clustering will allow you to write
> directly to any node.  Citus has also decided only to support the SQL
> queries they are optimized for, while we've made the decision to support
> "full SQL" (even if some queries might not be fast).
>
> *** We blogged yesterday about TimescaleDB's partitioning design choices in
> more depth, if you are interested:
> https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c
>
>
> On Thu, Apr 20, 2017 at 5:43 PM, Samuel Williams
> <space.ship.traveller@xxxxxxxxx> wrote:
>>
>> Andrew, how would timescaledb compare to citus - and is timescaledb an
>> extension to postgres or is it an entirely separate system?
>>
>> On 21 April 2017 at 02:44, Andrew Staller <andrew@xxxxxxxxxxxxx> wrote:
>> > Awesome thread.
>> >
>> > Samuel,
>> >
>> > Just wanted you to be aware of the work we're doing at TimescaleDB
>> > (http://www.timescale.com/), a time-series database extension for
>> > PostgreSQL.
>> >
>> > Some of how we might help you:
>> > - automatic partitioning by space (primary key - like country_id, for
>> > instance) and time. This creates "chunks" of your data, right-sized by
>> > volume and time constraints (which makes inserts fast at much greater
>> > scale
>> > than Vanilla PostgreSQL - not sure if this is a problem for you)
>> > - this will also help if your queries are selective on time and
>> > country_id
>> > (per this example)
>> > - the partitioning by time allows you to DROP old chunks without the
>> > need
>> > for vacuums
>> >
>> > On Thu, Apr 20, 2017 at 8:30 AM, Vick Khera <vivek@xxxxxxxxx> wrote:
>> >>
>> >> I'm curious why you have so many partial indexes. Are you trying to
>> >> make
>> >> custom indexes per query? It seems to me you might want to consider
>> >> making
>> >> the indexes general, and remove the redundant ones (that have the same
>> >> prefix list of indexed fields).
>> >>
>> >> Secondly your table is 102Gb. Clearly there's a lot of data here. How
>> >> many
>> >> rows does that take? I would further suggest that you partition this
>> >> table
>> >> such that there are no more than about 10 million rows per partition
>> >> (I've
>> >> done this by using a id % 100 computation). Maybe in your case it makes
>> >> sense to partition it based on the "what" field, because it appears you
>> >> are
>> >> trying to do that with your partial indexes already.
>> >>
>> >> On Wed, Apr 19, 2017 at 10:11 PM, Samuel Williams
>> >> <space.ship.traveller@xxxxxxxxx> wrote:
>> >>>
>> >>> Okay, so after changing longitude/latitude to float4, and
>> >>> re-organizing the table a bit, I got the query down from about 8
>> >>> minutes to 40 seconds.
>> >>>
>> >>> The details are in the gist comments:
>> >>> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
>> >>>
>> >>> Now, just need to get performance another 2 orders of magnitude
>> >>> better. Can we make the index a bit more coarse grained, perhaps
>> >>> convert long/lat to integers or something, use a hilbert index, use a
>> >>> postgis index... ideas?
>> >>>
>> >>>
>> >>> --
>> >>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>> >>> To make changes to your subscription:
>> >>> http://www.postgresql.org/mailpref/pgsql-general
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > TimescaleDB | Growth & Developer Evangelism
>> > c: 908.581.9509
>> >
>> > 335 Madison Ave.
>> > New York, NY 10017
>> > www.timescale.com
>> > https://github.com/timescale/timescaledb
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>
> --
> TimescaleDB | Growth & Developer Evangelism
> c: 908.581.9509
>
> 335 Madison Ave.
> New York, NY 10017
> www.timescale.com
> https://github.com/timescale/timescaledb

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general