Re: cephfs/ceph-fuse: mds0: Client XXX:XXX failingtorespond to capability release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 14 september 2016 om 14:56 schreef "Dennis Kramer (DT)" <dennis@xxxxxxxxx>:
> 
> 
> Hi Burkhard,
> 
> Thank you for your reply, see inline:
> 
> On Wed, 14 Sep 2016, Burkhard Linke wrote:
> 
> > Hi,
> >
> >
> > On 09/14/2016 12:43 PM, Dennis Kramer (DT) wrote:
> >> Hi Goncalo,
> >> 
> >> Thank you. Yes, i have seen that thread, but I have no near full osds and 
> >> my mds cache size is pretty high.
> >
> > You can use the daemon socket on the mds server to get an overview of the 
> > current cache state:
> >
> > ceph daemon mds.XXX perf dump
> >
> > The message itself indicates that the mds is in fact trying to convince 
> > clients to release capabilities, probably because it is running out of cache.
> 
> My cache is set to mds_cache_size = 15000000, but you are right, it seems 
> the complete cache is used, but that shouldn't be a real problem if the 
> clients can release the caps in time. Correct me if i'm wrong but the 
> cache_size is pretty high compared to the default (100k). I will raise the 
> mds_cache_size a bit and see if it helps a bit.
> 

The 100k is very, very conservative. Each cached inode will consume roughly 4k of memory.

15.000.000 * 4k =~ 58GB of memory

Can you verify that the MDS is indeed using about that amount of memory?

If you have enough memory available you can always increase the cache size on the MDS node(s). More caching in the MDS doesn't hurt in most situations.

Wido

> > The 'session ls' command on the daemon socket lists all current ceph clients 
> > and the number capabilities for each client. Depending on your workload / 
> > applications you might be surprised how many capabilities are assigned to 
> > individual nodes...
> >
> > From the client side of view the error means that there's either a bug in the 
> > client, or an application is keeping a large number of files open (e.g. do 
> > you run mlocate on the clients?)
> I haven't had this issue when I was on hammer and the amount of clients 
> haven't changed. I have "ceph fuse.ceph fuse.ceph-fuse" in my PRUNEFS for 
> updatedb, so it probably isn't mlocate which would cause this issue.
> The only real difference is my upgrade to Jewel.
> 
> 
> > If you use the kernel based client re-mounting won't help, since the internal 
> > state is keep the same (afaik). In case of the ceph-fuse client the ugly way 
> > to get rid off the mount point is a lazy / forced umount and killing the 
> > ceph-fuse process if necessary. Processes with open file handles will 
> > complain afterwards.
> >
> >
> > Before using rude ways to terminate the client session i would propose to 
> > look for rogue applications on the involved host. We had a number of problems 
> > with multithreaded applications and concurrent file access on the past (both 
> > with ceph-fuse from hammer and kernel based clients). lsof or other tools 
> > might help locating the application.
> 
> My cluster is back to HEALTH_OK, the involved host has been restarted by 
> the user. But I will debug some more on the host when i see this issue 
> again next time.
> 
> PS: For completeness, i've stated that this issue was often seen in my 
> current Jewel environment, I meant to say that this issue comes up 
> sometimes (so not so often). But the times when i *do* have this issue, it blocks some 
> I/O for clients as a consequence.
> 
> > Regards,
> > Burkhard
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux