Re: cephfs mds failing to respond to capability release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



is it possible to count open file descriptors in cephfs only?

On Wed, Nov 16, 2016 at 2:12 PM Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
I'm sorry, by server, I meant cluster.
On one cluster the rate of files created and read is about 5 per second.
On another cluster it's from 25 to 30 files created and read per second.

On Wed, Nov 16, 2016 at 2:03 PM Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
Hello John.

I'm sorry for the lack of information at the first post.
The same version is in use for servers and clients.

About the workload, it varies.
On one server it's about 5 files created/written and then fully read per second.
On the other server it's about 5 to 6 times that number, so a lot more, but the problem does not escalate at the same proportion.

~# ceph -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

~#dpkg -l | grep ceph
ii  ceph-fuse                            10.2.2-1trusty                   amd64        FUSE-based client for the Ceph distributed file system

Some things are worth mentioning:
The service(1) that creates the file sends an async request to another service(2) that reads it.
The service(1) that creates the file also deletes it when its client closes the connection, so it can do so while the other service(2) is trying to read it. i'm not sure what would happen here.



On Wed, Nov 16, 2016 at 1:42 PM John Spray <jspray@xxxxxxxxxx> wrote:
On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima
<webert.boss@xxxxxxxxx> wrote:
> hi,
>
> I have many clusters running cephfs, and in the last 45 days or so, 2 of
> them started giving me the following message in ceph health:
> mds0: Client dc1-mx02-fe02:guest failing to respond to capability release
>
> When this happens, cephfs stops responding. It will only get back after I
> restart the failing mds.
>
> Algo, I get the following logs from ceph.log
> https://paste.debian.net/896236/
>
> There was no change made that I can relate to this and I can't figure out
> what is happening.

I have the usual questions: what ceph versions, what clients etc
(http://docs.ceph.com/docs/jewel/cephfs/early-adopters/#reporting-issues)

Clients failing to respond to capability release are either buggy (old
kernels?) or it's also possible that you have a workload that is
holding an excessive number of files open.

Cheers,
John



> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux