On 9/14/18 10:04, Christophe Pettus
wrote:
In our experience, it's actually quite common that an RDS shutdown (or even just applying parameter changes) can take a while. What's particularly concerning is that it's not predictable, and that can make it hard to schedule and manage maintenance windows. What we were told previously is that RDS queues the operations, and it can take a variable amount of time for the operation to be worked on from the queue. Is that not the case? Thanks Christophe - even if it's not what Chris is running into, this is is another good call-out. It's important to distinguish here between the RDS parts and the community PostgreSQL parts. I think for this thread it's just worth pointing out that RDS automation/tooling will report the database in a "modifying" state until it completes its management operations, however the actual database unavailability is much shorter. RDS carefully engineers their processes to minimize the actual database unavailability itself. Chris has run into a problem where the PostgreSQL processes did not shut down, evidenced by the error messages he mentioned, and as a result his database was actually unavailable to applications for an extended period. This is uncommon and concerning. This isn't the right forum for discussing the RDS bits; lets take that to the AWS forums. It's not synchronous, but the time to complete should absolutely be predictable within reasonable bounds depending on the operation type. I don't know how anyone could use the platform otherwise! If anyone is unable to establish bounded expectations for some automated operation, I'd strongly encourage starting a thread on the AWS forums or opening a support ticket. On 9/14/18 09:27, Adrian Klaver wrote:
The thing is I do not remember any posts to this list mentioning the same problem on a platform outside RDS. A quick search seems to confirm that.I've met folks from other large fleet operators at PG conferences. There are all kinds of stories we don't find on the lists yet. :) Hopefully we're all getting better about closing the loop and sharing stuff back - that's part of the value large fleet operators can and should bring to the community. For the cases I've heard about, we haven't yet caught things quickly enough to get stack dumps. So I don't think we have particulars yet.I don't know about this specific incident, but I do know that the RDS -Jeremy -- Jeremy Schneider Database Engineer Amazon Web Services |