Re: Question on Hot Standby in PostgreSQL

"Williams, Alex" <awilliams@xxxxxxxxxxx> · Tue, 24 Nov 2015 16:28:24 +0000

Thank you both, I really appreciate your responses, and hopefully this may help others who may run into this.

I noticed on the link to that ref. it was 9.3, we have 9.2 (which can use that param too,) but I didn't realize I was looking at the 9.0 docs and didn't see that param there.

Concerning these two params below, when using hot_standby_feedback,  what would you set them if you had the situation where you had queries constantly running on the standby, some quick and some long, practically all day/night, everyday?

max_standby_archive_delay (integer)
max_standby_streaming_delay (integer)

[http://www.postgresql.org/docs/9.2/static/hot-standby.html]

Remedial possibilities exist if the number of standby-query cancellations is found to be unacceptable. The first option is to set the parameter
hot_standby_feedback, which prevents 
VACUUM from removing recently-dead rows and so cleanup conflicts do not occur. If you do this, you should note that this will delay cleanup of dead rows on the primary, which may result in undesirable table bloat. However, the cleanup situation will be
 no worse than if the standby queries were running directly on the primary server, and you are still getting the benefit of off-loading execution onto the standby.
max_standby_archive_delay must be kept large in this case, because delayed WAL files might already contain entries that conflict with the desired standby queries.

Should I still keep them as -1 (infinite)? If I do, will the WAL updates still be applied while current queries are running, for example:

1. Query1 starts SELECT (10 minute query) @ 10:00:00 AM (current row version xid's)
2. The WAL update is applied to the standby (2 seconds) starts at 10:00:08 AM, completes at 10:00:10 AM
3. Query2 starts SELECT @ 10:00:15 AM (starts 5 seconds after the previous WAL update was applied; will it get the new data -- row versions xids -- just applied from the WAL update from #2? It looks like the xids are preserved on the primary when that
 param is set: hot_standby_feedback, but just want to confirm, that while the primary maybe more bloated now with dead rows, the new and existing (which could be dead on the primary)  are there for the standby to see?)

And sorry, just one last question (I'll be going over the docs thoroughly today and will be testing these in two VM's before I do anything in production) and that is: for the vacuuming of dead rows on the master, I assume that will now occur once there
 are no queries running on the standby? We have a lot of cpu power/space/mem, so that won't be an issue, but I'm just curious what will trigger that, I just assume no active queries running would do that -- pls let me know if I'm wrong.

I've seen this, but prefer not to set that because it may still kill long running queries on the standby.
"Another option is to increase 
vacuum_defer_cleanup_age on the primary server, so that dead rows will not be cleaned up as quickly as they normally would be.
This will allow more time for queries to execute before they are canceled on the standby, without having to set a high
max_standby_streaming_delay. However it is difficult to guarantee any specific execution-time window with this approach, since
vacuum_defer_cleanup_age is measured in transactions executed on the primary server."

Again,  thank you both.

Alex

From: Yaser Raja <yrraja@xxxxxxxxx>

Date: Monday, November 23, 2015 4:08 PM

To: Scott Ribe <scott_ribe@xxxxxxxxxxxxxxxx>

Cc: "Williams, Alex" <awilliams@xxxxxxxxxxx>, "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>

Subject: Re:  Question on Hot Standby in PostgreSQL

On Mon, Nov 23, 2015 at 3:24 PM, Scott Ribe 
<scott_ribe@xxxxxxxxxxxxxxxx> wrote:

On Nov 20, 2015, at 12:50 PM, Williams, Alex <awilliams@xxxxxxxxxxx> wrote:

>

> However, I thought with the use of MVCC,

> an active query on the slave (long running, 30 seconds+) can run reading

> from row xid versions/snapshot...

The problem is that the master could well make changes that would remove rows in that snapshot because the master does not know anything about the rows that need preserving for the replica's query.

hot_standby_feedback addresses this problem. 

Here is the detail from the PG docs:

"Specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby. This parameter can be used to eliminate query
cancels caused by cleanup records, but can cause database bloat on the primary for some workloads. Feedback messages will not be sent more frequently than once per
wal_receiver_status_interval. The default value is
off. This parameter can only be set in the
postgresql.conf file or on the server command line."
http://www.postgresql.org/docs/9.3/static/runtime-config-replication.html

Note that this will only avoid the conflicts caused by cleanup records (which normally is the majority of conflicts). 

That said, IIRC there may be something in 9.4 or 9.5 which provides for the replica to send that info to the server in order to coordinate things, but I haven't used that and so cannot offer more.

--

Scott Ribe

scott_ribe@xxxxxxxxxxxxxxxx

http://www.elevated-dev.com/

https://www.linkedin.com/in/scottribe/

(303) 722-0567 voice

--

Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-admin