is it possible to count open file descriptors in cephfs only?
On Wed, Nov 16, 2016 at 2:12 PM Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
I'm sorry, by server, I meant cluster.
On one cluster the rate of files created and read is about 5 per second.
On another cluster it's from 25 to 30 files created and read per second.On Wed, Nov 16, 2016 at 2:03 PM Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:Hello John.
I'm sorry for the lack of information at the first post.The same version is in use for servers and clients.About the workload, it varies.
On one server it's about 5 files created/written and then fully read per second.On the other server it's about 5 to 6 times that number, so a lot more, but the problem does not escalate at the same proportion.
~# ceph -vceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)~#dpkg -l | grep cephii ceph-fuse 10.2.2-1trusty amd64 FUSE-based client for the Ceph distributed file system
Some things are worth mentioning:
The service(1) that creates the file sends an async request to another service(2) that reads it.The service(1) that creates the file also deletes it when its client closes the connection, so it can do so while the other service(2) is trying to read it. i'm not sure what would happen here.On Wed, Nov 16, 2016 at 1:42 PM John Spray <jspray@xxxxxxxxxx> wrote:On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima
<webert.boss@xxxxxxxxx> wrote:
> hi,
>
> I have many clusters running cephfs, and in the last 45 days or so, 2 of
> them started giving me the following message in ceph health:
> mds0: Client dc1-mx02-fe02:guest failing to respond to capability release
>
> When this happens, cephfs stops responding. It will only get back after I
> restart the failing mds.
>
> Algo, I get the following logs from ceph.log
> https://paste.debian.net/896236/
>
> There was no change made that I can relate to this and I can't figure out
> what is happening.
I have the usual questions: what ceph versions, what clients etc
(http://docs.ceph.com/docs/jewel/cephfs/early-adopters/#reporting-issues)
Clients failing to respond to capability release are either buggy (old
kernels?) or it's also possible that you have a workload that is
holding an excessive number of files open.
Cheers,
John
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com