Re: cephfs: apache locks up after parallel reloads on multiple nodes

Sander Smeenk <ssmeenk@xxxxxxxxxxxx> · Tue, 17 Sep 2019 17:11:54 +0200

Quoting Paul Emmerich (paul.emmerich@xxxxxxxx):

> Yeah, CephFS is much closer to POSIX semantics for a filesystem than
> NFS. There's an experimental relaxed mode called LazyIO but I'm not
> sure if it's applicable here.

Out of curiosity, how would CephFS being more POSIX compliant cause
this much delay in this situation? I'd understand if it would maybe
take up to a second or maybe two, but almost fifteen minutes and then
suddenly /all/ servers recover at the same time?

Would this situation exist because we have so many open filehandles per
server? Or could it also appear in a simpler "two servers share a
CephFS" setup?

I'm so curious to find out what /causes/ this.
"Closer to POSIX sematics" doesn't cut it for me in this case.
Not with the symptoms we're seeing.

> You can debug this by dumping slow requests from the MDS servers via
> the admin socket

As far as i understood, there's not much to see on the MDS servers when
this issue pops op. E.g. no slow ops logged during this event.

Regards,
-Sndr.
-- 
| I think i want a job cleaning mirrors...
| It's just something i can really see myself doing...
| 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7  FBD6 F3A9 9442 20CC 6CD2
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com