Re: Using Postgres to store high volume streams of sensor readings

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Sun, 23 Nov 2008 11:23:55 -0700

On Sun, Nov 23, 2008 at 10:01 AM, Ciprian Dorin Craciun
<ciprian.craciun@xxxxxxxxx> wrote:
> On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
>> * Ciprian Dorin Craciun (ciprian.craciun@xxxxxxxxx) wrote:
>>> > Even better might be partitioning on the timestamp.  IF all access is
>>> > in a certain timestamp range it's usually a big win, especially
>>> > because he can move to a new table every hour / day / week or whatever
>>> > and merge the old one into a big "old data" table.
>>>
>>>     Yes, If i would speed the inserts tremendously... I've tested it
>>> and the insert speed is somewhere at 200k->100k.
>>>
>>>     But unfortunately the query speed is not good at all because most
>>> queries are for a specific client (and sensor) in a given time
>>> range...
>>
>> Have you set up your partitions correctly (eg, with appropriate CHECK
>> constraints and with constraint_exclusion turned on)?  Also, you'd want
>> to keep your indexes on the individual partitions, of course..  That
>> should improve query time quite a bit since it should only be hitting
>> the partitions where the data might be.
>>
>>        Stephen
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR
>> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6
>> =G7aX
>> -----END PGP SIGNATURE-----
>
>    Well, now that I've read the previous two emails better, I
> understand what Scot and Stephen are talking about...
>
>    So if I understood it correctly: I should build indexes only for
> certain parts of the data (like previous full hour and so). But I see
> a problem: wouldn't this lead to a lot of indices beeing created (24 /
> hour, ~150 / week, ...)?

No, not exactly what I'm talking about.  I'm talking about
pre-creating partitions that the data will soon go into (let's say a
new one every hour) with indexes in place, and having a trigger that
fires on insert to put the data into the right partition.  Once that
partition is no longer being inserted into, and we aren't running a
bunch of queries on it, we migrate it to a historical partition.

So, your table looks something like this all the time:

|**|^^|##|##|##|##|$$$$$$$$$$$$$$...|

Where:

** is a partition we have created in advance of needing it.
^^ is the partition we are currently writing to
## are the partitions we're still using in select queries a lot
$$$... are the old data stuffed into the monolithic history table.

When it's time to switch to writing to the new partition (i.e. **) we
make a new one ahead of that, and the trigger starts writing to what
was a ** partition but is now the new ^^, and the ^^ becomes a ##.  At
the end of the day / week whatever, we take all the old ## partitions
and move their data into the $$$ and drop the ## partitions.

Note that we only need to put data into an archive partition to keep
from having hundreds or thousands of partitions.  There's a limit of a
few hundred partitions where things start getting slow again due to
planner overhead.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general