Re: Logical decoding client has the power to crash the server

Meel Velliste <meel@xxxxxxxxxxxx> · Thu, 21 Sep 2017 04:09:37 +0000

Hi Michael,
Thank you, I appreciate your response. Now that you mention, I am realizing that I don't really care about dropping the oldest log entries. Mandatory monitoring makes a lot of sense and dropping the entire slot would be perfect when it consumes too much space.

The only problem with monitoring is that I may have no control over it. My use case is complicated by the fact that there are three parties:
1) Our customer who has admin privileges on the database
2) Us with limited privileges
3) The database hosting provider who restricts access to the underlying OS and file system

In this situation, neither us, nor our customer has the power to install the required monitoring of pg_xlog. The database hosting provider would have to do it. In most cases (e.g. Amazon RDS) the hosting provider does provide a way of monitoring overall disk usage, which may be good enough. But I am thinking it would make sense for postgres to have default, built-in monitoring that drops all the slots when pg_xlog gets too full (based on some configurable limit). Otherwise everybody has to build their own monitoring and I imagine 99% of them would want the same behavior. Nobody wants their database to fail just because some client was not reading the slot. 

In our case, if we lose access to the customer's database, if they did not install monitoring (even though we told them to), their disk will fill up and they will blame us for crashing their database. It ends up being a classic case of finger pointing between multiple parties. This has not happened yet but I am sure it is just a matter of time. I would really like to see a default, built-in circuit breaker in postgres to prevent this.

Another bit of context here is that the logical decoding is of secondary importance to our customers, but their postgres database itself is absolutely mission critical.

Thanks,

Meel

On Wed, Sep 20, 2017 at 12:43 AM Michael Paquier <michael.paquier@xxxxxxxxx> wrote:
On Wed, Sep 20, 2017 at 3:14 PM, Meel Velliste <meel@xxxxxxxxxxxx> wrote:

> From what I understand about logical decoding, there is no limit to how many

> log entries will be retained by the server if nobody reads them from the

> logical slot. This means that a client that fails to read from the slot has

> the power to bring down the master database because the server's disk will

> get full at which point all subsequent write operations will fail and even

> read operations will fail because they too need temporary space. Even the

> underlying operating system may be affected as it too may need temporary

> disk space to carry out its basic functions.

Monitoring is a mandatory part of the handling of replication slots.

One possible solution is to use a background worker that scans slots

causing bloat in pg_xlog and to automatically get rid of them so as

the primary is preserved from any crash. Note that advancing a slot is

doable for a physical slot, but advancing a logical slot is trickier

(not sure if that's doable actually but Andres can comment on that)

because it involves being sure that the catalog_xmin is still

preserved so as past logical changes can be looked at consistently.

--

Michael