Re: Using AWS ephemeral SSD storage for production database workload?

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Mon, 29 Jan 2018 18:32:16 +0100

On 01/29/2018 05:41 PM, Pritam Barhate wrote:
> Hi everyone, 
> 
> As you may know, EBS volumes though durable are very costly when you 
> need provisioned IOPS. As opposed to this AWS instance attached
> ephemeral SSD is very fast but isn't durable.
> 
> I have come across some ideas on the Internet where people hinted at 
> running production PostgreSQL workloads on AWS ephemeral SSD
> storage. Generally, this involves shipping WAL logs continuously to
> S3 and keeping an async read replica in another AWS availability
> zone. Worst case scenario in such deployment is data loss of a few
> seconds. But beyond this the details are sketchy.
> 

Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?

> Have you come across such a deployment? What are some best practices 
> that need to be followed to pull this through without significant
> data loss? Even though WAL logs are being shipped to S3, in case of
> loss of both the instances, the restore time is going be quite a bit
> for databases of a few hundred GBs.
> 

Pretty much everyone who is serious about HA is running such cluster. If
they can't afford any data loss, they use synchronous replicas instead.
That's a basic latency-durability trade-off.

> Just to be clear, I am not planning anything like this, anytime soon
> :-) But I am curious about trade-offs of such a deployment. Any
> concrete information in this aspect is well appreciated.
> 

Pretty much everyone is using such architecture (primary + streaming
replicas) nowadays, so it's a reasonably well understood scenario. But
it's really unclear what kind of information you expect to get, or how
much time have you spent reading about this.

There is quite a bit of information in the official docs, although maybe
a bit too low level - it certainly gives you the building blocks instead
of a complete solution. There are also books like [1] for example.

And finally there are tools that help with managing such clusters, like
for example [2]. Not only it's rather bad idea to implement this on your
own (bugs, unnecessary effort) but the tools also show how to do stuff.

[1]
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition

[2] https://repmgr.org/

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services