You should definitely not patch the PAF source code without opening an issue on github and discuss your changes. As Adrien explained, your changes could greatly end up with an instance corruption or data loss. On Wed, 25 Apr 2018 07:45:55 +0000 范国腾 <fanguoteng@xxxxxxxxxx> wrote: ... > Is there any way to make the cluster recover if the postgres was not properly > stopped, such as the lab power off or the OS reboot? Os graceful reboot is supposed to be a clean shutdown, you should not have trouble with PostgreSQL. For real failure scenarios, you must: * put your cluster in maintenance mode * fix your PostgreSQL setup and replication * make sure PostgreSQL is replicating correctly * make sure to keep the master where the cluster is waiting for it if needed * make sure to stop your postgresql instances if the cluster is considering it is stopped * switch off maintenance mode * you might need to start your resource if the cluster kept it stopped. You can find documentation here and in other pages around: https://clusterlabs.github.io/PAF/administration.html > -----邮件原件----- > 发件人: Adrien Nayrat [mailto:adrien.nayrat@xxxxxxxxxxxx] > 发送时间: 2018年4月25日 15:29 > 收件人: Cluster Labs - All topics related to open-source clustering welcomed > <users@xxxxxxxxxxxxxxx>; 范国腾 <fanguoteng@xxxxxxxxxx>; Andrew Edenburn > <andrew.edenburn@xxxxxx>; pgsql-general@xxxxxxxxxxxxxx 主题: Re: > [ClusterLabs] 答复: Postgres PAF setup > > On 04/25/2018 02:31 AM, 范国腾 wrote: > > I have meet the similar issue when the postgres is not stopped normally. > > > > You could run pg_controldata to check if your postgres status is > > shutdown/shutdown in recovery. > > > > I change the /usr/lib/ocf/resource.d/heartbeat/pgsqlms to avoid this > > problem: > > > > elsif ( $pgisready_rc == 2 ) { > > # The instance is not listening. > > # We check the process status using pg_ctl status and check # if it > > was propertly shut down using pg_controldata. > > ocf_log( 'debug', 'pgsql_monitor: instance "%s" is not listening', > > $OCF_RESOURCE_INSTANCE ); > > # return _confirm_stopped(); # remove this line > > return $OCF_NOT_RUNNING; > > } > > Hello, > > It is a bad idea. The goal of _confirm_stopped is to check if the instance > was properly stopped. If it wasn't you could corrupt your instance. > > _confirm_stopped return $OCF_NOT_RUNNING only if the instance was properly > shutdown : elsif ( $controldata_rc == $OCF_NOT_RUNNING ) { > # The controldata state is consistent, the instance was probably > # propertly shut down. > ocf_log( 'debug', > '_confirm_stopped: instance "%s" controldata indicates that the > instance was propertly shut down', $OCF_RESOURCE_INSTANCE ); > return $OCF_NOT_RUNNING; > } > > Regards, > > > -- > Adrien NAYRAT > > > _______________________________________________ > Users mailing list: Users@xxxxxxxxxxxxxxx > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Jehan-Guillaume de Rorthais Dalibo