Re: per client mds throttle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 3, 2019 at 2:58 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> On Tue, 2 Jul 2019, Dan van der Ster wrote:
> > Hi,
> >
> > Are there any plans to implement a per-client throttle on mds client requests?
> >
> > We just had an interesting case where a new cephfs user was hammering
> > an mds from several hosts. In the end we found that their code was
> > doing:
> >
> >   while d=getafewbytesofdata():
> >     f=open(file.dat)
> >     f.append(d)
> >     f.close()
> >
> > By changing their code to:
> >
> >   f=open(file.dat)
> >   while d=getafewbytesofdata():
> >     f.append(d)
> >   f.close()
> >
> > it completely removes their load on the mds (for obvious reasons).
>
> This is a tangential point, but: at some point in the past at least the
> above two code blcoks used to incur the same load on the MDS because the
> client would only asynchronously release the caps back to the MDS.  Did we
> change that behavior?  Maybe we can restore it, but with a very a short
> delay, so that we cover patterns like the above?
>

With a simple python test, putting the open/close inside the loop [1],
I see an update ino req to the mds after each close:

2019-07-03 15:30:08.012349 7f30bfd68700  1 --
137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 --
client_request(unknown.0:31983 open #0x5000177be10 2019-07-03
15:30:08.012345 caller_uid=0, caller_gid=0{0,}) v4 -- 0x56212b328300
con 0
2019-07-03 15:30:08.016138 7f30bd563700  1 --
137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 --
client_caps(update ino 0x5000177be10 135309857 seq 4811 tid 61814
caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxcwb follows 1 size 3/3
ts 756/0 mtime 2019-07-03 15:30:08.015825) v11 -- 0x562126b5bf80 con 0
2019-07-03 15:30:08.016950 7f30c0569700  1 --
137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 --
client_caps(update ino 0x5000177be10 135309857 seq 4811 tid 61815
caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxcwb follows 1 size 6/3
ts 756/0 mtime 2019-07-03 15:30:08.016637) v11 -- 0x56212728e880 con 0
2019-07-03 15:30:08.017848 7f30bf567700  1 --
137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 --
client_caps(update ino 0x5000177be10 135309857 seq 4811 tid 61816
caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxcwb follows 1 size 9/3
ts 756/0 mtime 2019-07-03 15:30:08.017439) v11 -- 0x56212148b180 con 0

and the flush_ack's from the mds (not exactly corresponding to above):

2019-07-03 15:32:37.831384 7f30c4d72700  1 --
137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234
227525 ==== client_caps(grant ino 0x5000177be10 135309857 seq 4812
caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 1206/0
ts 756/0 mtime 2019-07-03 15:30:08.385521) v11 ==== 299+0+0
(1635817588 0 0) 0x5621252efa80 con 0x56211eefb800
2019-07-03 15:32:37.871171 7f30c4d72700  1 --
137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234
227527 ==== client_caps(grant ino 0x5000177be10 135309857 seq 4814
caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size
1206/4194304 ts 756/0 mtime 2019-07-03 15:30:08.385521) v11 ====
299+0+0 (3626990096 0 0) 0x56212361bb00 con 0x56211eefb800
2019-07-03 15:32:41.018963 7f30c4d72700  1 --
137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234
227528 ==== client_caps(flush_ack ino 0x5000177be10 135309857 seq 4814
tid 62216 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0
mtime 0.000000) v11 ==== 252+0+0 (197724779 0 0) 0x56212b20e900 con
0x56211eefb800
2019-07-03 15:32:41.019026 7f30c4d72700  1 --
137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234
227529 ==== client_caps(flush_ack ino 0x5000177be10 135309857 seq 4814
tid 62217 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0
mtime 0.000000) v11 ==== 252+0+0 (197724779 0 0) 0x5621252e9600 con
0x56211eefb800
2019-07-03 15:32:41.019047 7f30c4d72700  1 --
137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234
227530 ==== client_caps(flush_ack ino 0x5000177be10 135309857 seq 4814
tid 62218 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0
mtime 0.000000) v11 ==== 252+0+0 (197724779 0 0) 0x562123402480 con
0x56211eefb800

With open/close outside the loop, there's very little conversation
with the mds... renewing caps and the occasional flush_ack from time
to time.

This is with 12.2.12 mds, 12.2.12 fuse client.

-- Dan

[1]
while(True):
  f = open('out.dat', 'a')  // append or truncate doesn't make a
difference to the client-mds traffic, afaict
  f.write('aaa')
  f.close()

> sage
>
>
> >
> > In a multi-user environment it's hard to scrutinize every user's
> > application, so we'd prefer to just throttle down the client req rates
> > (and let them suffer from the poor performance).
> >
> > Thoughts?
> >
> > Thanks,
> >
> > Dan
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> >
> >
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux