On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote: > Hello, Mr. Yan > > On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >> >> >> The client hold so many capabilities because kernel keeps lots of >> inodes in its cache. Kernel does not trim inodes by itself if it has >> no memory pressure. It seems you have set mds_cache_size config to a >> large value. > > > Yes, I have set mds_cache_size = 3000000 > I usually set this value according to the number of ceph.dir.rentries in > cephfs. Isn't that a good approach? > > I have 2 directories in cephfs root, sum of ceph.dir.rentries is 4670933, > for which I would set mds_cache_size to 5M (if I had enough RAM for that in > the MDS server). > > # getfattr -d -m ceph.dir.* index > # file: index > ceph.dir.entries="776" > ceph.dir.files="0" > ceph.dir.rbytes="52742318965" > ceph.dir.rctime="1513334528.09909569540" > ceph.dir.rentries="709233" > ceph.dir.rfiles="459512" > ceph.dir.rsubdirs="249721" > ceph.dir.subdirs="776" > > > # getfattr -d -m ceph.dir.* mail > # file: mail > ceph.dir.entries="786" > ceph.dir.files="1" > ceph.dir.rbytes="15000378101390" > ceph.dir.rctime="1513334524.0993982498" > ceph.dir.rentries="3961700" > ceph.dir.rfiles="3531068" > ceph.dir.rsubdirs="430632" > ceph.dir.subdirs="785" > > >> mds cache size isn't large enough, so mds does not ask >> the client to trim its inode cache neither. This can affect >> performance. we should make mds recognize idle client and ask idle >> client to trim its caps more aggressively > > > I think you mean that the mds cache IS large enough, right? So it doesn't > bother the clients. > >> This can affect performance. we should make mds recognize idle client and >> ask idle client to trim its caps more aggressively > > > One recurrent problem I have, which I guess is caused by a network issue > (ceph cluster in vrack), is that my MDS servers start switching who is the > active. > This happens after a lease_timeout occur in the mon, then I get "dne in the > mds map" from the active MDS and it suicides. > Even though I use standby-replay, the standby takes from 15min up to 2 hours > to take over as active. I see that it loads all inodes (by issuing "perf > dump mds" on the mds daemon). > > So, question is: if the number of caps is as low as it is supposed to be > (around 300k) instead if 5M, would the MDS be active faster in such case of > a failure? yes, mds recovery should be faster when client fewer caps. recent version kernel client and ceph-fuse should trim they cache aggressively when mds recovers. Regards Yan, Zheng > > Regards, > > Webert Lima > DevOps Engineer at MAV Tecnologia > Belo Horizonte - Brasil > IRC NICK - WebertRLZ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com