Re: Kernel mounted RBD's hanging

Ilya Dryomov <idryomov@xxxxxxxxx> · Thu, 29 Jun 2017 17:57:51 +0200

On Thu, Jun 29, 2017 at 4:30 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi All,
>
> Putting out a call for help to see if anyone can shed some light on this.
>
> Configuration:
> Ceph cluster presenting RBD's->XFS->NFS->ESXi
> Running 10.2.7 on the OSD's and 4.11 kernel on the NFS gateways in a
> pacemaker cluster
> Both OSD's and clients are go into a pair of switches, single L2 domain (no
> sign from pacemaker that there is network connectivity issues)
>
> Symptoms:
> - All RBD's on a single client randomly hang for 30s to several minutes,
> confirmed by pacemaker and ESXi hosts complaining

Hi Nick,

What is a "single client" here?

> - Cluster load is minimal when this happens most times

Can you post gateway syslog and point at when this happened?
Corresponding pacemaker excerpts won't hurt either.

> - All other clients with RBD's are not affected (Same RADOS pool), so its
> seems more of a client issue than cluster issue
> - It looks like pacemaker tries to also stop RBD+FS resource, but this also
> hangs
> - Eventually pacemaker succeeds in stopping resources and immediately
> restarts them, IO returns to normal
> - No errors, slow requests, or any other non normal Ceph status is reported
> on the cluster or ceph.log
> - Client logs show nothing apart from pacemaker
>
> Things I've tried:
> - Different kernels (potentially happened less with older kernels, but can't
> be 100% sure)

But still happened?  Do you have a list of all the kernels you've tried?

> - Disabling scrubbing and anything else that could be causing high load
> - Enabling Kernel RBD debugging (Problem maybe happens a couple of times a
> day, debug logging was not practical as I can't reproduce on demand)

When did it start occuring?  Can you think of any configuration changes
that might have been the trigger or is this a new setup?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com