Re: Kernel mounted RBD's hanging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, Jun 29, 2017 at 10:30 AM Nick Fisk <nick@xxxxxxxxxx> wrote:
Hi All,

Putting out a call for help to see if anyone can shed some light on this.

Configuration:
Ceph cluster presenting RBD's->XFS->NFS->ESXi
Running 10.2.7 on the OSD's and 4.11 kernel on the NFS gateways in a
pacemaker cluster
Both OSD's and clients are go into a pair of switches, single L2 domain (no
sign from pacemaker that there is network connectivity issues)

Symptoms:
- All RBD's on a single client randomly hang for 30s to several minutes,
confirmed by pacemaker and ESXi hosts complaining
- Cluster load is minimal when this happens most times
- All other clients with RBD's are not affected (Same RADOS pool), so its
seems more of a client issue than cluster issue
- It looks like pacemaker tries to also stop RBD+FS resource, but this also
hangs
- Eventually pacemaker succeeds in stopping resources and immediately
restarts them, IO returns to normal
- No errors, slow requests, or any other non normal Ceph status is reported
on the cluster or ceph.log
- Client logs show nothing apart from pacemaker

Things I've tried:
- Different kernels (potentially happened less with older kernels, but can't
be 100% sure)
- Disabling scrubbing and anything else that could be causing high load
- Enabling Kernel RBD debugging (Problem maybe happens a couple of times a
day, debug logging was not practical as I can't reproduce on demand)

Anyone have any ideas?

Nick, are you using any network aggregation, LACP?  Can you drop to a simplest possible configuration to make sure there's nothing on the network switch side?

Do you check the ceph.log for any anomalies?

Any occurrences on OSD nodes, anything in their OSD logs or syslogs?

Aany odd page cache settings on the clients?

Alex



Thanks,
Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
--
Alex Gorbachev
Storcium
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux