On Wed, Jul 3, 2019 at 2:58 PM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Tue, 2 Jul 2019, Dan van der Ster wrote: > > Hi, > > > > Are there any plans to implement a per-client throttle on mds client requests? > > > > We just had an interesting case where a new cephfs user was hammering > > an mds from several hosts. In the end we found that their code was > > doing: > > > > while d=getafewbytesofdata(): > > f=open(file.dat) > > f.append(d) > > f.close() > > > > By changing their code to: > > > > f=open(file.dat) > > while d=getafewbytesofdata(): > > f.append(d) > > f.close() > > > > it completely removes their load on the mds (for obvious reasons). > > This is a tangential point, but: at some point in the past at least the > above two code blcoks used to incur the same load on the MDS because the > client would only asynchronously release the caps back to the MDS. Did we > change that behavior? Maybe we can restore it, but with a very a short > delay, so that we cover patterns like the above? > With a simple python test, putting the open/close inside the loop [1], I see an update ino req to the mds after each close: 2019-07-03 15:30:08.012349 7f30bfd68700 1 -- 137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 -- client_request(unknown.0:31983 open #0x5000177be10 2019-07-03 15:30:08.012345 caller_uid=0, caller_gid=0{0,}) v4 -- 0x56212b328300 con 0 2019-07-03 15:30:08.016138 7f30bd563700 1 -- 137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 -- client_caps(update ino 0x5000177be10 135309857 seq 4811 tid 61814 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxcwb follows 1 size 3/3 ts 756/0 mtime 2019-07-03 15:30:08.015825) v11 -- 0x562126b5bf80 con 0 2019-07-03 15:30:08.016950 7f30c0569700 1 -- 137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 -- client_caps(update ino 0x5000177be10 135309857 seq 4811 tid 61815 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxcwb follows 1 size 6/3 ts 756/0 mtime 2019-07-03 15:30:08.016637) v11 -- 0x56212728e880 con 0 2019-07-03 15:30:08.017848 7f30bf567700 1 -- 137.138.157.69:0/165379415 --> 137.138.13.153:6800/2391910234 -- client_caps(update ino 0x5000177be10 135309857 seq 4811 tid 61816 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxcwb follows 1 size 9/3 ts 756/0 mtime 2019-07-03 15:30:08.017439) v11 -- 0x56212148b180 con 0 and the flush_ack's from the mds (not exactly corresponding to above): 2019-07-03 15:32:37.831384 7f30c4d72700 1 -- 137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234 227525 ==== client_caps(grant ino 0x5000177be10 135309857 seq 4812 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 1206/0 ts 756/0 mtime 2019-07-03 15:30:08.385521) v11 ==== 299+0+0 (1635817588 0 0) 0x5621252efa80 con 0x56211eefb800 2019-07-03 15:32:37.871171 7f30c4d72700 1 -- 137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234 227527 ==== client_caps(grant ino 0x5000177be10 135309857 seq 4814 caps=pAsxLsXsxFsxcrwb dirty=- wanted=pAsxXsxFxwb follows 0 size 1206/4194304 ts 756/0 mtime 2019-07-03 15:30:08.385521) v11 ==== 299+0+0 (3626990096 0 0) 0x56212361bb00 con 0x56211eefb800 2019-07-03 15:32:41.018963 7f30c4d72700 1 -- 137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234 227528 ==== client_caps(flush_ack ino 0x5000177be10 135309857 seq 4814 tid 62216 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (197724779 0 0) 0x56212b20e900 con 0x56211eefb800 2019-07-03 15:32:41.019026 7f30c4d72700 1 -- 137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234 227529 ==== client_caps(flush_ack ino 0x5000177be10 135309857 seq 4814 tid 62217 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (197724779 0 0) 0x5621252e9600 con 0x56211eefb800 2019-07-03 15:32:41.019047 7f30c4d72700 1 -- 137.138.157.69:0/165379415 <== mds.4 137.138.13.153:6800/2391910234 227530 ==== client_caps(flush_ack ino 0x5000177be10 135309857 seq 4814 tid 62218 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 0 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (197724779 0 0) 0x562123402480 con 0x56211eefb800 With open/close outside the loop, there's very little conversation with the mds... renewing caps and the occasional flush_ack from time to time. This is with 12.2.12 mds, 12.2.12 fuse client. -- Dan [1] while(True): f = open('out.dat', 'a') // append or truncate doesn't make a difference to the client-mds traffic, afaict f.write('aaa') f.close() > sage > > > > > > In a multi-user environment it's hard to scrutinize every user's > > application, so we'd prefer to just throttle down the client req rates > > (and let them suffer from the poor performance). > > > > Thoughts? > > > > Thanks, > > > > Dan > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx