Re: Throttling xlator on the bricks

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Tue, 26 Jan 2016 09:03:45 +0530

On 01/26/2016 08:14 AM, Vijay Bellur wrote:
On 01/25/2016 12:36 AM, Ravishankar N wrote:
Hi,

We are planning to introduce a throttling xlator on the server (brick)
process to regulate FOPS. The main motivation is to solve complaints 
about
AFR selfheal taking too much of CPU resources. (due to too many fops for
entry
self-heal, rchecksums for data self-heal etc.)

I am wondering if we can re-use the same xlator for throttling 
bandwidth, iops etc. in addition to fops. Based on admin configured 
policies we could provide different upper thresholds to different 
clients/tenants and this could prove to be an useful feature in 
multitenant deployments to avoid starvation/noisy neighbor class of 
problems. Has any thought gone in this direction?

Nope. It was mainly about internal processes at the moment.

The throttling is achieved using the Token Bucket Filter algorithm
(TBF). TBF
is already used by bitrot's bitd signer (which is a client process) in
gluster to regulate the CPU intensive check-sum calculation. By 
putting the
logic on the brick side, multiple clients- selfheal, bitrot, 
rebalance or
even the mounts themselves can avail the benefits of throttling.

The TBF algorithm in a nutshell is as follows: There is a bucket which
is filled
at a steady (configurable) rate with tokens. Each FOP will need a fixed
amount
of tokens to be processed. If the bucket has that many tokens, the 
FOP is
allowed and that many tokens are removed from the bucket. If not, the 
FOP is
queued until the bucket is filled.

The xlator will need to reside above io-threads and can have different
buckets,
one per client. There has to be a communication mechanism between the
client and
the brick (IPC?) to tell what FOPS need to be regulated from it, and the
no. of
tokens needed etc. These need to be re configurable via appropriate
mechanisms.
Each bucket will have a token filler thread which will fill the tokens
in it.

If there is one bucket per client and one thread per bucket, it would 
be difficult to scale as the number of clients increase. How can we do 
this better?

It is same thread for all the buckets. Because the number of internal 
clients at the moment is in single digits. The problem statement we have 
right now doesn't consider what you are looking for.

The main thread will enqueue heals in a list in the bucket if there 
aren't
enough tokens. Once the token filler detects some FOPS can be serviced,
it will
send a cond-broadcast to a dequeue thread which will process (stack
wind) all
the FOPS that have the required no. of tokens from all buckets.

This is just a high level abstraction: requesting feedback on any 
aspect of
this feature. what kind of mechanism is best between the 
client/bricks for
tuning various parameters? What other requirements do you foresee?

I am in favor of having administrator defined policies or templates 
(collection of policies) being used to provide the tuning parameter 
per client or a set of clients. We could even have a default template 
per use case etc. Is there a specific need to have this negotiation 
between clients and servers?

Thanks,
Vijay

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel