Re: CephFS client side metadata ops throttling based on quotas

Xuehan Xu <xxhdx1985126@xxxxxxxxx> · Mon, 4 Mar 2019 12:42:59 +0800

On Thu, 28 Feb 2019 at 22:13, Xiaoxi Chen <superdebuger@xxxxxxxxx> wrote:
>
>
>
> Xuehan Xu <xxhdx1985126@xxxxxxxxx> 于2019年2月28日周四 下午5:36写道：
>>
>> On Thu, 28 Feb 2019 at 09:50, Xiaoxi Chen <superdebuger@xxxxxxxxx> wrote:
>> >
>> > I doubt throttling can works.
>> >
>> > If your two workloads can stay in separate namespace(i.e not sharing any data), you can easily achieve the isolation by multi-mds +dir_pin.
>> >
>> Hi, thanks for your reply:-)
>>
>> We are an infrastracture department that offer distributed file system
>> service to other departments each of which has a lot of different
>> workloads, some of which maybe bursty while others are not. Each of
>> those workloads is running on a exclusive directory. And what's more,
>> even non-bursty workloads maybe occasionally do some "extra things"
>> which produces lots of metadata requests. It's hard for us to know on
>> which directory there would be a bursty workload, so it's hard for us
>> to predict which directory should be pinned to other MDSes.
>
>
> I also in similar position with you and offer similar service.  We prepare separate set of MDS for EACH tenant( / ceph_root/ <tenent_id>) to provide metadata isolation. Rate limiting and burst workload is not the only issue, but also, cache pollution like "ls -R".
>
> The only downside of this solution is for each single user, you cannot have multi-mds as the mds_pin doesnt provide affinity_mask.  But it offer nice isolation.

Hi, thanks for your information, it's very helpful to us;-)

The difference between us is that each of our tenant have a lot of
"sub-tenants" of themselves. It is the fact that some "sub-tenants"
are interfering others of the same "parent-tenant" that's giving us
the most headache. So, we resorted to implementing some kind of QoS to
solve the issue.

By the way, may I ask what kind of business are using cephfs in your
company? I'm asking this because we are judging what kind of business
to recommand NOT to use cephfs. It's totally understandable if you
can't share the information:-)

Thanks again:-)
>>
>>
>>
>> > If your burst workload and your regular workload are targeting same namespace, then it may cause more lock contention and in general make things worse....e.g "ls -al" which will translate to a readdir and a lot of getattr, and holding the read lock of the dir. If you throttle the getattr which leads the read lock being taken for longer time and blocks those writers.
>> >
>> If client-side throttling is not an option, do you think server-side
>> metadata op qos is feasible? And is dmclock suitable for implementing
>> it? Thanks:-)
>
>