Re: replication monitoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just realized a better question might be: is it safe to grep out the following line from the replication status results?

nsds5replicaLastUpdateStatus: 1 Can't acquire busy replica

Or can that error message occur in a situation where the replica somehow stays permanently busy, and hence failed?

Thanks,
Russ.

On Aug 27, 2015, at 7:06 PM, Russell Beall <beall@xxxxxxx> wrote:

Thanks for that.  I had looked into that but it was a bit heavyweight compared to what we are trying to do.  I was hoping there was an easy way to simply have the command-line ignore the condition where the servers were temporarily busy.

We are using AWS and having CloudWatch store statistics and do the monitoring for events, so we just want to send a boolean value that the replication is or is not functioning.

I think I will just have to have the command retry and issue a failure if there is no success over a certain number of seconds.

Regards,
Russ.


On Aug 20, 2015, at 11:49 PM, Alexander Jung <alexander.w.jung@xxxxxxxxx> wrote:

Hi,

we use http://cnmonitor.sourceforge.net/ to keep an eye on our ldap servers, including replication.

Works nicely and sends mail if something goes amiss.

Mit freundlichen Grüßen,

Alexander Jung

2015-08-21 4:35 GMT+02:00 Russell Beall <beall@xxxxxxx>:
Hello,

I have deployed a MMR cluster with a recent (about April) version of 389 from the CentOS 6 repository.

Following example 2 of this document, I have tried to set up a monitoring script on each node to verify that replication is correctly succeeding:

The monitoring command-line search usually works, but when replication is occurring it returns a false-positive for replication errors because some of the replicas are busy.

Rather than grepping out on the word “busy” which might lead us to miss the state when everything is erring out because everything is busy, I thought I should ask for recommendations on handling this.

My best idea is to run the command several times over several seconds and if it fails more than X times in a row, then issue an alert.  Of course that wouldn’t work if there was a longer-than-usual replication underway.  Is there a better way to do this?

Thank you,
Russ.



--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux