Dear list, We recently switched the shared storage for our linux shared hosting platforms from "nfs" to "cephfs". Performance improvement are noticeable. It all works fine, however, there is one peculiar thing: when Apache reloads after a logrotate of the "error" logs all but one node will hang for ~ 15 minutes. The log rotates are scheduled with a cron, the nodes themselves synced with ntp. The first node that reloads apache will keep on working, all the others will hang, and after a period of ~ 15 minutes they will all recover almost simultaneously. Our setup looks like this: 10 webservers all sharing the same cephfs filesystem. Each webserver with around 100 apache threads has around 10.000 open file handles to "error" logs on cephfs. To be clear, all webservers have a file handle on _the same_ "error" logs. The logrotate takes around two seconds on the "surviving" node. What could be the reason for this? Does it have something to do with file locking, i.e. that it behaves differently on cephfs compared to nfs (more strict)? What would be a good way to find out what is the root cause? We have sysdig traces of different nodes, but on the nodes where apache hangs not a lot is going on ... until it all recovers. We remediated this by delaying the Apache reloads on all but one node. Then there is no issue at all, even as all the other web servers still reload almost at the same time. Any info / hints on how to investigate this issue further are highly appreciated. Gr. Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com