Practical limits to the number of volumes?

toby.corkindale at strategicdata.com.au (Toby Corkindale) · Wed, 22 May 2013 11:30:17 +1000

On 21/05/13 22:45, Joseph Santaniello wrote:
> Hello All,
>
> I am exploring options for deploying a Gluster system, and one
> possible scenario we are contemplating would involve potentially
> thousands (1-2000) of volumes with correspondingly thousands of
> mounts.
>
> Are there any intrinsic reason why this would be a bad idea with Gluster?

Two thoughts occur to me - firstly, memory consumption:

Gluster spawns a process for every volume on the servers and for every 
mount on the client. So you'd end up with a lot of glusterfs processes 
running on each machine. That's a lot of context switching for the 
kernel to do, and they're going to use a non-negligible amount of memory.

I'm not actually sure what the real-world memory requirement per process 
is.. On a couple of machines I just checked, it looks like between 
15-30M (VmRSS-VmLib), but your mileage my vary.

If your memory use per-gluster-process is just 24M, that's still 48G of 
ram required to launch a couple of thousand of them. If it turns out 
they need more like 128M each, that's quarter of a terabyte of memory 
required per machine.

The second thing that worries me is that gluster's recovery mechanism 
doesn't have anything to prevent simultaneous recovery across all the 
volumes on a node. As a result, as soon as a bad node rejoins the 
cluster, all your 2000 volumes will simultaneously start rebuilding, 
causing massive random i/o load, and all your clients will starve.
That happens to me even with just a couple of dozen volumes, so I hate 
to think how it'd go with thousands!

-Toby