Hi,
I'd recommend looking into 2 solutions here, depending if you want to keep the data or not and how "fault tolerant" is the app (could you loose data from the last hour?)
If you could loose some data and you don't intend keeping the whole data set - I'd look at using RAM as your storage - it's the fastest place to store data and you can easily get servers with lot's of RAM these days.
If you can't loose data then you'll need to use disks - depending on how big each row is, compute your expected read/write throughput and go shopping :)
for 1kb rows you get 60GB per hour = 16MB per sec - simple disks can handle this.
for 10kb rows you get 160MB/s - now you need better disks :)
SSD disks are cheep these days but they don't like repeated writes/deletes so it might cause problems down the line (hence my first RAM recommendation)
as for keeping the raw data - you could easily do it if you use partitions, if you have daily partitions inheriting from a master table you can quickly access the last day (or even several days)
but do take note that a full table scan takes time and you'll need to manage writing to the right partition yourself (not that hard - simply insert to <tablename>_<date>)
if you can write the data daily, keeping it will not add any real load (a little on parse times for queries that access the master table)
Just my 2 cents,
Regards,
- Jony
On Mon, Aug 3, 2015 at 9:53 AM, Chris Withers <chris@xxxxxxxxxxxxxxxx> wrote:
Hi All,
I'm curious if there are recommendations for scaling postgres to what, for me, seems like "a lot" of data...
The app in question currently writes around 1.5 billion rows into a table before rolling them up into tables that have a few million roll up rows each. That 1.5 billion row table is emptied and refilled each day, so we're talking about quite high write as well as quite high read. Where can I find could examples/docs of how to scale postgres for this kind of data load? What sort of hardware would I be looking to spec?
Okay, now this app may well eventually want to progress to storing those 1.5 billion rows per day. Is that feasible with postgres? If not, what storage and processing solutions would people recommend for that kind of data load?
cheers,
Chris
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general