Re: Throttling xlator on the bricks

Venky Shankar <vshankar@xxxxxxxxxx> · Tue, 26 Jan 2016 11:07:42 +0530

On Tue, Jan 26, 2016 at 03:11:50AM +0000, Richard Wareing wrote:
> > If there is one bucket per client and one thread per bucket, it would be
> > difficult to scale as the number of clients increase. How can we do this
> > better?
> 
> On this note... consider that 10's of thousands of clients are not unrealistic in production :).  Using a thread per bucket would also be....unwise..
> 
> On the idea in general, I'm just wondering if there's specific (real-world) cases where this has even been an issue where least-prio queuing hasn't been able to handle?  Or is this more of a theoretical concern?  I ask as I've not really encountered situations where I wished I could give more FOPs to SHD vs rebalance and such.
> 
> In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it follows....bug prone) approaches.

TBF isn't complicated at all - it's widely used for traffic shaping, cgroups, UML to rate limit disk I/O.

But, I won't hurry up on things and wait to hear out from Shreyas regarding his throttling design.

> 
> Richard
> 
> ________________________________________
> From: gluster-devel-bounces@xxxxxxxxxxx [gluster-devel-bounces@xxxxxxxxxxx] on behalf of Vijay Bellur [vbellur@xxxxxxxxxx]
> Sent: Monday, January 25, 2016 6:44 PM
> To: Ravishankar N; Gluster Devel
> Subject: Re:  Throttling xlator on the bricks
> 
> On 01/25/2016 12:36 AM, Ravishankar N wrote:
> > Hi,
> >
> > We are planning to introduce a throttling xlator on the server (brick)
> > process to regulate FOPS. The main motivation is to solve complaints about
> > AFR selfheal taking too much of CPU resources. (due to too many fops for
> > entry
> > self-heal, rchecksums for data self-heal etc.)
> 
> 
> I am wondering if we can re-use the same xlator for throttling
> bandwidth, iops etc. in addition to fops. Based on admin configured
> policies we could provide different upper thresholds to different
> clients/tenants and this could prove to be an useful feature in
> multitenant deployments to avoid starvation/noisy neighbor class of
> problems. Has any thought gone in this direction?
> 
> >
> > The throttling is achieved using the Token Bucket Filter algorithm
> > (TBF). TBF
> > is already used by bitrot's bitd signer (which is a client process) in
> > gluster to regulate the CPU intensive check-sum calculation. By putting the
> > logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
> > even the mounts themselves can avail the benefits of throttling.
> >
> > The TBF algorithm in a nutshell is as follows: There is a bucket which
> > is filled
> > at a steady (configurable) rate with tokens. Each FOP will need a fixed
> > amount
> > of tokens to be processed. If the bucket has that many tokens, the FOP is
> > allowed and that many tokens are removed from the bucket. If not, the FOP is
> > queued until the bucket is filled.
> >
> > The xlator will need to reside above io-threads and can have different
> > buckets,
> > one per client. There has to be a communication mechanism between the
> > client and
> > the brick (IPC?) to tell what FOPS need to be regulated from it, and the
> > no. of
> > tokens needed etc. These need to be re configurable via appropriate
> > mechanisms.
> > Each bucket will have a token filler thread which will fill the tokens
> > in it.
> 
> If there is one bucket per client and one thread per bucket, it would be
> difficult to scale as the number of clients increase. How can we do this
> better?
> 
> > The main thread will enqueue heals in a list in the bucket if there aren't
> > enough tokens. Once the token filler detects some FOPS can be serviced,
> > it will
> > send a cond-broadcast to a dequeue thread which will process (stack
> > wind) all
> > the FOPS that have the required no. of tokens from all buckets.
> >
> > This is just a high level abstraction: requesting feedback on any aspect of
> > this feature. what kind of mechanism is best between the client/bricks for
> > tuning various parameters? What other requirements do you foresee?
> >
> 
> I am in favor of having administrator defined policies or templates
> (collection of policies) being used to provide the tuning parameter per
> client or a set of clients. We could even have a default template per
> use case etc. Is there a specific need to have this negotiation between
> clients and servers?
> 
> Thanks,
> Vijay
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e=
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel