Re: cephfs: apache locks up after parallel reloads on multiple nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 17, 2019 at 8:12 AM Sander Smeenk <ssmeenk@xxxxxxxxxxxx> wrote:
>
> Quoting Paul Emmerich (paul.emmerich@xxxxxxxx):
>
> > Yeah, CephFS is much closer to POSIX semantics for a filesystem than
> > NFS. There's an experimental relaxed mode called LazyIO but I'm not
> > sure if it's applicable here.
>
> Out of curiosity, how would CephFS being more POSIX compliant cause
> this much delay in this situation? I'd understand if it would maybe
> take up to a second or maybe two, but almost fifteen minutes and then
> suddenly /all/ servers recover at the same time?
>
> Would this situation exist because we have so many open filehandles per
> server? Or could it also appear in a simpler "two servers share a
> CephFS" setup?
>
> I'm so curious to find out what /causes/ this.
> "Closer to POSIX sematics" doesn't cut it for me in this case.
> Not with the symptoms we're seeing.

Yeah this sounds weird. 15 minutes is one or two timers but I can't
think of anything that should be related here.

I'd look and see what sys calls the apache daemons are making and how
long they're taking; in particular what's different between the first
server and the rest. If they're doing a lot of the same syscalls but
just much slower on the follow-on servers, that probably indicates
they're all hammering the CephFS cluster with conflicting updates
(especially if they're writes!) that NFS simply ignored and collapsed.
If it's just one syscall that takes minutes to complete, check the mds
admin socket for ops_in_flight.
-Greg

>
>
> > You can debug this by dumping slow requests from the MDS servers via
> > the admin socket
>
> As far as i understood, there's not much to see on the MDS servers when
> this issue pops op. E.g. no slow ops logged during this event.
>
>
> Regards,
> -Sndr.
> --
> | I think i want a job cleaning mirrors...
> | It's just something i can really see myself doing...
> | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7  FBD6 F3A9 9442 20CC 6CD2
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux