Re: cephfs: apache locks up after parallel reloads on multiple nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, CephFS is much closer to POSIX semantics for a filesystem than
NFS. There's an experimental relaxed mode called LazyIO but I'm not
sure if it's applicable here.

You can debug this by dumping slow requests from the MDS servers via
the admin socket


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Sep 12, 2019 at 5:07 PM Stefan Kooman <stefan@xxxxxx> wrote:
>
> Dear list,
>
> We recently switched the shared storage for our linux shared hosting
> platforms from "nfs" to "cephfs". Performance improvement are
> noticeable. It all works fine, however, there is one peculiar thing:
> when Apache reloads after a logrotate of the "error" logs all but one
> node will hang for ~ 15 minutes. The log rotates are scheduled with a
> cron, the nodes themselves synced with ntp. The first node that reloads
> apache will keep on working, all the others will hang, and after a
> period of ~ 15 minutes they will all recover almost simultaneously.
>
> Our setup looks like this: 10 webservers all sharing the same cephfs
> filesystem. Each webserver with around 100 apache threads has around
> 10.000 open file handles to "error" logs on cephfs. To be clear, all
> webservers have a file handle on _the same_ "error" logs. The logrotate
> takes around two seconds on the "surviving" node.
>
> What could be the reason for this? Does it have something to do with
> file locking, i.e. that it behaves differently on cephfs compared to nfs
> (more strict)? What would be a good way to find out what is the root
> cause? We have sysdig traces of different nodes, but on the nodes where
> apache hangs not a lot is going on ... until it all recovers.
>
> We remediated this by delaying the Apache reloads on all but one node.
> Then there is no issue at all, even as all the other web servers still
> reload almost at the same time.
>
> Any info / hints on how to investigate this issue further are highly
> appreciated.
>
> Gr. Stefan
>
> --
> | BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
> | GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux