We've been monitoring gluster with an icinga/nagios plugin we wrote that looks at gluster volume status output (using --xml) to check that the volume and all bricks are in the 'started' state. Additionally, for 'replicate' volumes, we look at 'gluster volume heal <vol> info' and 'gluster volume heal <vol> info heal-failed' output. We count the number of files waiting to be healed and currently, if it's above 0 for 4 minutes it results in an alert. For heal-failed, we look at the list of failed entries and alert if any in the list were from the previous 300 seconds (this list is a running log, rather than a real-time list of files that are not healed). We were also going to check the 'heal <vol> info split-brain' output, but that appears to be just a list of files without timestamps that doesn't get cleaned up after a file is successfully healed after resolving the split-brain. There's no way to know if the issue is current or was from last month and has already been resolved. What we have is better than nothing, but it's not ideal, IMO. I wish there were something more like NetApp's snapmirror status output where you could monitor how many seconds behind replication was or a percentage complete before being caught up. Given that gluster is file-based it's not the same as snapmirror's snapshot-based replication, so I understand the difficulties here, but the idea is the same. There's currently no way that I know of to see how much gluster needs to replicate, how fast it's happening, how soon it might be done, etc. You just get a list of files that gluster thinks need to get synced and that's about it. Some of those files might need a full sync while others might just need attributes changed or a block added. Gluster doesn't even really know that until it tries to actually heal the file, AFAIK. Anyway, that's what we're currently doing. It's at least something. We get a handful of false alerts if we're copying large files or lots of them for many minutes and gluster hasn't had a chance to catch up entirely, but that's fine for us. For the most part, we do very few writes, so gluster replication is always up-to-date. I'd be curious to hear what others are doing to monitor gluster. Todd On Wed, Jul 31, 2013 at 12:35:05PM +0530, Sejal1 S wrote: > Even I have similar requirement in my setup. > > Please suggest us the correct way to ensure the replication > > > .Sejal > > > > From: > Matthew Sacks <msacksdandb at gmail.com> > To: > gluster-users at gluster.org > Date: > 31-07-2013 03:21 > Subject: > monitoring gluster replication status > Sent by: > gluster-users-bounces at gluster.org > > > > Hello, > >From what I've seen there is no way I can monitor cluster status via munin > or any other method for that matter. > > How can I ensure replication is working properly? > > Thanks in advance, > Matt_______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users