Quoting Paul Emmerich (paul.emmerich@xxxxxxxx): > Yeah, CephFS is much closer to POSIX semantics for a filesystem than > NFS. There's an experimental relaxed mode called LazyIO but I'm not > sure if it's applicable here. Out of curiosity, how would CephFS being more POSIX compliant cause this much delay in this situation? I'd understand if it would maybe take up to a second or maybe two, but almost fifteen minutes and then suddenly /all/ servers recover at the same time? Would this situation exist because we have so many open filehandles per server? Or could it also appear in a simpler "two servers share a CephFS" setup? I'm so curious to find out what /causes/ this. "Closer to POSIX sematics" doesn't cut it for me in this case. Not with the symptoms we're seeing. > You can debug this by dumping slow requests from the MDS servers via > the admin socket As far as i understood, there's not much to see on the MDS servers when this issue pops op. E.g. no slow ops logged during this event. Regards, -Sndr. -- | I think i want a job cleaning mirrors... | It's just something i can really see myself doing... | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com