Re: Implement QoS for CephFS

Songbo Wang <hack.coo@xxxxxxxxx> · Fri, 26 Jul 2019 16:40:02 +0800

Hi Gregory,

Appreciated for your reply. And comments inline.

Gregory Farnum <gfarnum@xxxxxxxxxx> 于2019年7月26日周五 上午3:01写道：
>
> On Wed, Jul 24, 2019 at 8:29 PM Songbo Wang <songbo1227@xxxxxxxxx> wrote:
> >
> > Hi guys,
> >
> > As a distributed filesystem, all clients of CephFS share the whole
> > cluster's resources, for example, IOPS, throughput. In some cases,
> > resources will be occupied by some clients. So QoS for CephFS is
> > needed in most cases.
> >
> > Based on the token bucket algorithm, I implement QoS for CephFS.
> >
> > The basic idea is as follows:
> >
> >   1. Set QoS info as one of the dir's xattrs;
> >   2. All clients can access the same dirs with the same QoS setting.
> >   3. Similar to the Quota's config flow. when the MDS receives the QoS
> > setting, it'll also broadcast the message to all clients.
> >   4. We can change the limit online.
> >
> >
> > And we will config QoS as follows, it supports
> > {limit/burst}{iops/bps/read_iops/read_bps/write_iops/write_bps}
> > configure setting, some examples:
> >
> >       setfattr -n ceph.qos.limit.iops           -v 200 /mnt/cephfs/testdirs/
> >       setfattr -n ceph.qos.burst.read_bps -v 200 /mnt/cephfs/testdirs/
> >       getfattr -n ceph.qos.limit.iops                      /mnt/cephfs/testdirs/
> >       getfattr -n ceph.qos
> > /mnt/cephfs/testdirs/
> >
> >
> > But, there is also a big problem. For the bps{bps/write_bps/read_bps}
> > setting, if the bps is lower than the request's block size, the client
> > will be blocked until it gets enough token.
> >
> > Any suggestion will be appreciated, thanks!
> >
> > PR: https://github.com/ceph/ceph/pull/29266
>
> I briefly skimmed this and if I understand correctly, this lets you
> specify a per-client limit on hierarchies. But it doesn't try and
> limit total IO across a hierarchy, and it doesn't let you specify
> total per-client limits if they have multiple mount points.
>
> Given this, what's the point of maintaining the QoS data in the
> filesystem instead of just as information that's passed when the
> client mounts?

Sorry for my little description of my design in my previous email.
I have made two kinds of design, as follows:
1. all clients use the same QoS setting, just as the implementation in this PR.
Maybe there are multiple mount points, if we limit the total IO, the
number of total
mount points is also limited. So in my implementation, the total IO &
BPS is not limited.

2. all clients share a specific QoS setting. I think there are two
kinds of use cases in detail.
2.1 setting a total limit, all clients limited by the average:
total_limit/clients_num.
2.2 setting a total limit, the mds decide the client's limitation by
their historical IO&BPS.

I think both of these design all have their usage scenario.
So I need more suggestions about this feature.

> How hard is this scheme likely to be to implement in the kernel?

It's not difficult to implement this in the kernel. I think most of
the work is to translate the TokenBucketThrottle into the kernel.
I have started this work and will push the code when I finish.

Thanks.