Re: Throttling xlator on the bricks

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Mon, 25 Jan 2016 21:09:33 -0800

On 01/25/16 20:36, Pranith Kumar Karampuri wrote:

On 01/26/2016 08:41 AM, Richard Wareing wrote:
If there is one bucket per client and one thread per bucket, it 
would be
difficult to scale as the number of clients increase. How can we do 
this
better?
On this note... consider that 10's of thousands of clients are not 
unrealistic in production :).  Using a thread per bucket would also 
be....unwise..

There is only one thread and this solution is for internal 
processes(shd, rebalance, quota etc) not coming in the way of clients 
which do I/O.

On the idea in general, I'm just wondering if there's specific 
(real-world) cases where this has even been an issue where least-prio 
queuing hasn't been able to handle?  Or is this more of a theoretical 
concern?  I ask as I've not really encountered situations where I 
wished I could give more FOPs to SHD vs rebalance and such.

I have seen users resort to offline healing of the bricks whenever a 
brick is replaced, or new brick is added to replication to increase 
replica count. When entry self-heal happens or big VM image data 
self-heals which do rchecksums CPU spikes are seen and I/O becomes 
useless.
This is the recent thread where a user ran into similar problem (just 
yesterday) (This is a combination of client-side healing and 
healing-load):
http://www.gluster.org/pipermail/gluster-users/2016-January/025051.html

We can find more of such threads if we put some time to dig into the 
mailing list.
I personally have seen people even resort to things like, "we let 
gluster heal over the weekend or in the nights when none of us are 
working on the volumes" etc.

I get at least weekly complaints of such on the IRC channel. A lot of 
them are in virtual environments (aws).

There are people who complain healing is too slow too. We get both 
kinds of complaints :-). Your multi-threaded shd patch is going to 
help here. I somehow feel you guys are in this set of people :-).

+1

In any event, it might be worth having Shreyas detail his throttling 
feature (that can throttle any directory hierarchy no less) to 
illustrate how a simpler design can achieve similar results to these 
more complicated (and it follows....bug prone) approaches.

The solution we came up with is about throttling internal I/O. And 
there are only 4/5 such processes(shd, rebalance, quota, bitd etc). 
What you are saying above about throttling any directory hierarchy 
seems a bit different than what we are trying to solve from the looks 
of it(At least from the small description you gave above :-) ). 
Shreyas' mail detailing the feature would definitely help us 
understand what each of us are trying to solve. We want to GA both 
multi-threaded shd and this feature for 3.8.

Pranith

Richard

________________________________________
From: gluster-devel-bounces@xxxxxxxxxxx 
[gluster-devel-bounces@xxxxxxxxxxx] on behalf of Vijay Bellur 
[vbellur@xxxxxxxxxx]
Sent: Monday, January 25, 2016 6:44 PM
To: Ravishankar N; Gluster Devel
Subject: Re:  Throttling xlator on the bricks

On 01/25/2016 12:36 AM, Ravishankar N wrote:
Hi,

We are planning to introduce a throttling xlator on the server (brick)
process to regulate FOPS. The main motivation is to solve complaints 
about
AFR selfheal taking too much of CPU resources. (due to too many fops 
for
entry
self-heal, rchecksums for data self-heal etc.)

I am wondering if we can re-use the same xlator for throttling
bandwidth, iops etc. in addition to fops. Based on admin configured
policies we could provide different upper thresholds to different
clients/tenants and this could prove to be an useful feature in
multitenant deployments to avoid starvation/noisy neighbor class of
problems. Has any thought gone in this direction?

The throttling is achieved using the Token Bucket Filter algorithm
(TBF). TBF
is already used by bitrot's bitd signer (which is a client process) in
gluster to regulate the CPU intensive check-sum calculation. By 
putting the
logic on the brick side, multiple clients- selfheal, bitrot, 
rebalance or
even the mounts themselves can avail the benefits of throttling.

The TBF algorithm in a nutshell is as follows: There is a bucket which
is filled
at a steady (configurable) rate with tokens. Each FOP will need a fixed
amount
of tokens to be processed. If the bucket has that many tokens, the 
FOP is
allowed and that many tokens are removed from the bucket. If not, 
the FOP is
queued until the bucket is filled.

The xlator will need to reside above io-threads and can have different
buckets,
one per client. There has to be a communication mechanism between the
client and
the brick (IPC?) to tell what FOPS need to be regulated from it, and 
the
no. of
tokens needed etc. These need to be re configurable via appropriate
mechanisms.
Each bucket will have a token filler thread which will fill the tokens
in it.
If there is one bucket per client and one thread per bucket, it would be
difficult to scale as the number of clients increase. How can we do this
better?

The main thread will enqueue heals in a list in the bucket if there 
aren't
enough tokens. Once the token filler detects some FOPS can be serviced,
it will
send a cond-broadcast to a dequeue thread which will process (stack
wind) all
the FOPS that have the required no. of tokens from all buckets.

This is just a high level abstraction: requesting feedback on any 
aspect of
this feature. what kind of mechanism is best between the 
client/bricks for
tuning various parameters? What other requirements do you foresee?

I am in favor of having administrator defined policies or templates
(collection of policies) being used to provide the tuning parameter per
client or a set of clients. We could even have a default template per
use case etc. Is there a specific need to have this negotiation between
clients and servers?

Thanks,
Vijay

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e= 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel