On Fri, Dec 15, 2017 at 8:46 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: > On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima > <webert.boss@xxxxxxxxx> wrote: >> Hello, Mr. Yan >> >> On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >>> >>> >>> The client hold so many capabilities because kernel keeps lots of >>> inodes in its cache. Kernel does not trim inodes by itself if it has >>> no memory pressure. It seems you have set mds_cache_size config to a >>> large value. >> >> >> Yes, I have set mds_cache_size = 3000000 >> I usually set this value according to the number of ceph.dir.rentries in >> cephfs. Isn't that a good approach? >> >> I have 2 directories in cephfs root, sum of ceph.dir.rentries is 4670933, >> for which I would set mds_cache_size to 5M (if I had enough RAM for that in >> the MDS server). >> >> # getfattr -d -m ceph.dir.* index >> # file: index >> ceph.dir.entries="776" >> ceph.dir.files="0" >> ceph.dir.rbytes="52742318965" >> ceph.dir.rctime="1513334528.09909569540" >> ceph.dir.rentries="709233" >> ceph.dir.rfiles="459512" >> ceph.dir.rsubdirs="249721" >> ceph.dir.subdirs="776" >> >> >> # getfattr -d -m ceph.dir.* mail >> # file: mail >> ceph.dir.entries="786" >> ceph.dir.files="1" >> ceph.dir.rbytes="15000378101390" >> ceph.dir.rctime="1513334524.0993982498" >> ceph.dir.rentries="3961700" >> ceph.dir.rfiles="3531068" >> ceph.dir.rsubdirs="430632" >> ceph.dir.subdirs="785" >> >> >>> mds cache size isn't large enough, so mds does not ask >>> the client to trim its inode cache neither. This can affect >>> performance. we should make mds recognize idle client and ask idle >>> client to trim its caps more aggressively >> >> >> I think you mean that the mds cache IS large enough, right? So it doesn't >> bother the clients. yes, I mean the cache config is large enough. >> >>> This can affect performance. we should make mds recognize idle client and >>> ask idle client to trim its caps more aggressively >> >> >> One recurrent problem I have, which I guess is caused by a network issue >> (ceph cluster in vrack), is that my MDS servers start switching who is the >> active. >> This happens after a lease_timeout occur in the mon, then I get "dne in the >> mds map" from the active MDS and it suicides. >> Even though I use standby-replay, the standby takes from 15min up to 2 hours >> to take over as active. I see that it loads all inodes (by issuing "perf >> dump mds" on the mds daemon). >> >> So, question is: if the number of caps is as low as it is supposed to be >> (around 300k) instead if 5M, would the MDS be active faster in such case of >> a failure? 300k are ready quite a lot. opening them requires long time. does you mail server really open so many files? > > yes, mds recovery should be faster when client fewer caps. recent > version kernel client and ceph-fuse should trim they cache > aggressively when mds recovers. > I checked 4.4 kernel, it includes the code that trim cache when mds recovers. > Regards > Yan, Zheng > >> >> Regards, >> >> Webert Lima >> DevOps Engineer at MAV Tecnologia >> Belo Horizonte - Brasil >> IRC NICK - WebertRLZ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com