Re: cephfs mds millions of caps

"Yan, Zheng" <ukernel@xxxxxxxxx> · Fri, 15 Dec 2017 20:46:50 +0800

On Fri, Dec 15, 2017 at 6:54 PM, Webert de Souza Lima
<webert.boss@xxxxxxxxx> wrote:
> Hello, Mr. Yan
>
> On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>
>>
>> The client hold so many capabilities because kernel keeps lots of
>> inodes in its cache. Kernel does not trim inodes by itself if it has
>> no memory pressure. It seems you have set mds_cache_size config to a
>> large value.
>
>
> Yes, I have set mds_cache_size = 3000000
> I usually set this value according to the number of ceph.dir.rentries in
> cephfs. Isn't that a good approach?
>
> I have 2 directories in cephfs root, sum of ceph.dir.rentries is 4670933,
> for which I would set mds_cache_size to 5M (if I had enough RAM for that in
> the MDS server).
>
> # getfattr -d -m ceph.dir.* index
> # file: index
> ceph.dir.entries="776"
> ceph.dir.files="0"
> ceph.dir.rbytes="52742318965"
> ceph.dir.rctime="1513334528.09909569540"
> ceph.dir.rentries="709233"
> ceph.dir.rfiles="459512"
> ceph.dir.rsubdirs="249721"
> ceph.dir.subdirs="776"
>
>
> # getfattr -d -m ceph.dir.* mail
> # file: mail
> ceph.dir.entries="786"
> ceph.dir.files="1"
> ceph.dir.rbytes="15000378101390"
> ceph.dir.rctime="1513334524.0993982498"
> ceph.dir.rentries="3961700"
> ceph.dir.rfiles="3531068"
> ceph.dir.rsubdirs="430632"
> ceph.dir.subdirs="785"
>
>
>> mds cache size isn't large enough, so mds does not ask
>> the client to trim its inode cache neither. This can affect
>> performance. we should make mds recognize idle client and ask idle
>> client to trim its caps more aggressively
>
>
> I think you mean that the mds cache IS large enough, right? So it doesn't
> bother the clients.
>
>> This can affect performance. we should make mds recognize idle client and
>> ask idle client to trim its caps more aggressively
>
>
> One recurrent problem I have, which I guess is caused by a network issue
> (ceph cluster in vrack), is that my MDS servers start switching who is the
> active.
> This happens after a lease_timeout occur in the mon, then I get "dne in the
> mds map" from the active MDS and it suicides.
> Even though I use standby-replay, the standby takes from 15min up to 2 hours
> to take over as active. I see that it loads all inodes (by issuing "perf
> dump mds" on the mds daemon).
>
> So, question is: if the number of caps is as low as it is supposed to be
> (around 300k) instead if 5M, would the MDS be active faster in such case of
> a failure?

yes, mds recovery should be faster when client fewer caps. recent
version kernel client and ceph-fuse should trim they cache
aggressively when mds recovers.

Regards
Yan, Zheng

>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> IRC NICK - WebertRLZ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com