Re: How long should metrics collection on a cluster take?

Sankarshan Mukhopadhyay <sankarshan.mukhopadhyay@xxxxxxxxx> · Tue, 24 Jul 2018 22:10:00 +0530

On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri
<pkarampu@xxxxxxxxxx> wrote:
> hi,
>       Quite a few commands to monitor gluster at the moment take almost a
> second to give output.

Is this at the (most) minimum recommended cluster size?

> Some categories of these commands:
> 1) Any command that needs to do some sort of mount/glfs_init.
>      Examples: 1) heal info family of commands 2) statfs to find
> space-availability etc (On my laptop replica 3 volume with all local bricks,
> glfs_init takes 0.3 seconds on average)
> 2) glusterd commands that need to wait for the previous command to unlock.
> If the previous command is something related to lvm snapshot which takes
> quite a few seconds, it would be even more time consuming.
>
> Nowadays container workloads have hundreds of volumes if not thousands. If
> we want to serve any monitoring solution at this scale (I have seen
> customers use upto 600 volumes at a time, it will only get bigger) and lets
> say collecting metrics per volume takes 2 seconds per volume(Let us take the
> worst example which has all major features enabled like
> snapshot/geo-rep/quota etc etc), that will mean that it will take 20 minutes
> to collect metrics of the cluster with 600 volumes. What are the ways in
> which we can make this number more manageable? I was initially thinking may
> be it is possible to get gd2 to execute commands in parallel on different
> volumes, so potentially we could get this done in ~2 seconds. But quite a
> few of the metrics need a mount or equivalent of a mount(glfs_init) to
> collect different information like statfs, number of pending heals, quota
> usage etc. This may lead to high memory usage as the size of the mounts tend
> to be high.
>

I am not sure if starting from the "worst example" (it certainly is
not) is a good place to start from. That said, for any environment
with that number of disposable volumes, what kind of metrics do
actually make any sense/impact?

> I wanted to seek suggestions from others on how to come to a conclusion
> about which path to take and what problems to solve.
>
> I will be happy to raise github issues based on our conclusions on this mail
> thread.
>
> --
> Pranith
>

-- 
sankarshan mukhopadhyay
<https://about.me/sankarshan.mukhopadhyay>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel