On 01/29/2018 05:41 PM, Pritam Barhate wrote: > Hi everyone, > > As you may know, EBS volumes though durable are very costly when you > need provisioned IOPS. As opposed to this AWS instance attached > ephemeral SSD is very fast but isn't durable. > > I have come across some ideas on the Internet where people hinted at > running production PostgreSQL workloads on AWS ephemeral SSD > storage. Generally, this involves shipping WAL logs continuously to > S3 and keeping an async read replica in another AWS availability > zone. Worst case scenario in such deployment is data loss of a few > seconds. But beyond this the details are sketchy. > Both log shipping and async replication are ancient features, and should be well understood. What exactly is unclear? > Have you come across such a deployment? What are some best practices > that need to be followed to pull this through without significant > data loss? Even though WAL logs are being shipped to S3, in case of > loss of both the instances, the restore time is going be quite a bit > for databases of a few hundred GBs. > Pretty much everyone who is serious about HA is running such cluster. If they can't afford any data loss, they use synchronous replicas instead. That's a basic latency-durability trade-off. > Just to be clear, I am not planning anything like this, anytime soon > :-) But I am curious about trade-offs of such a deployment. Any > concrete information in this aspect is well appreciated. > Pretty much everyone is using such architecture (primary + streaming replicas) nowadays, so it's a reasonably well understood scenario. But it's really unclear what kind of information you expect to get, or how much time have you spent reading about this. There is quite a bit of information in the official docs, although maybe a bit too low level - it certainly gives you the building blocks instead of a complete solution. There are also books like [1] for example. And finally there are tools that help with managing such clusters, like for example [2]. Not only it's rather bad idea to implement this on your own (bugs, unnecessary effort) but the tools also show how to do stuff. [1] https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition [2] https://repmgr.org/ regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services