Why not use EBS storage, but don’t use provisioned iops SSDs (io1) for the ebs volume. Just use the default storage type (gp2) and live with the 3000 IOPS peak for 30 minutes that that allows. You’d be amazed at just how much I/o can be handled within the default IOPS allowance, though bear in mind that you accrue iops credits at a rate that is proportional to storage amount once you’ve started to eat into your quota, so the performance of someone using general-purpose SSDs (gp2) with 2 terabytes of storage will be different than someone using 100GB of storage. But I recently moved several databases to gp2 storage and saved a ton of money doing so (we were paying for 5000 IOPS and using 5 AT PEAK other than brief bursts to a couple hundred when backing up and restoring). I’ve done numerous backups and restores on those hosts since then and have had no trouble keeping up and have never come close to the 3k theoretical max, even briefly. Replication doesn’t appear to be bothered, either.
Going to ephemeral storage seems unnecessarily problem prone when instances die, and I’m not even sure it is an option in RDS or recent EC2 instance types, which require EBS volumes even for the boot volume. But EBS with general purpose storage isn’t much more expensive than ephemeral.
On Mon, Jan 29, 2018 at 08:42 Pritam Barhate <pritambarhate@xxxxxxxxx> wrote:
Hi everyone,As you may know, EBS volumes though durable are very costly when you need provisioned IOPS. As opposed to this AWS instance attached ephemeral SSD is very fast but isn't durable.I have come across some ideas on the Internet where people hinted at running production PostgreSQL workloads on AWS ephemeral SSD storage. Generally, this involves shipping WAL logs continuously to S3 and keeping an async read replica in another AWS availability zone. Worst case scenario in such deployment is data loss of a few seconds. But beyond this the details are sketchy.Have you come across such a deployment? What are some best practices that need to be followed to pull this through without significant data loss? Even though WAL logs are being shipped to S3, in case of loss of both the instances, the restore time is going be quite a bit for databases of a few hundred GBs.Just to be clear, I am not planning anything like this, anytime soon :-) But I am curious about trade-offs of such a deployment. Any concrete information in this aspect is well appreciated.Regards,Pritam.