Re: Kernel mounted RBD's hanging

Nick Fisk <nick@xxxxxxxxxx> · Fri, 30 Jun 2017 13:12:44 +0100

From: Alex Gorbachev [mailto:ag@xxxxxxxxxxxxxxxxxxx] 
Sent: 30 June 2017 03:54
To: Ceph Users <ceph-users@xxxxxxxxxxxxxx>; nick@xxxxxxxxxx
Subject: Re:  Kernel mounted RBD's hanging

On Thu, Jun 29, 2017 at 10:30 AM Nick Fisk <nick@xxxxxxxxxx> wrote:
Hi All,

Putting out a call for help to see if anyone can shed some light on this.

Configuration:
Ceph cluster presenting RBD's->XFS->NFS->ESXi
Running 10.2.7 on the OSD's and 4.11 kernel on the NFS gateways in a
pacemaker cluster
Both OSD's and clients are go into a pair of switches, single L2 domain (no
sign from pacemaker that there is network connectivity issues)

Symptoms:
- All RBD's on a single client randomly hang for 30s to several minutes,
confirmed by pacemaker and ESXi hosts complaining
- Cluster load is minimal when this happens most times
- All other clients with RBD's are not affected (Same RADOS pool), so its
seems more of a client issue than cluster issue
- It looks like pacemaker tries to also stop RBD+FS resource, but this also
hangs
- Eventually pacemaker succeeds in stopping resources and immediately
restarts them, IO returns to normal
- No errors, slow requests, or any other non normal Ceph status is reported
on the cluster or ceph.log
- Client logs show nothing apart from pacemaker

Things I've tried:
- Different kernels (potentially happened less with older kernels, but can't
be 100% sure)
- Disabling scrubbing and anything else that could be causing high load
- Enabling Kernel RBD debugging (Problem maybe happens a couple of times a
day, debug logging was not practical as I can't reproduce on demand)

Anyone have any ideas?

Nick, are you using any network aggregation, LACP?  Can you drop to a simplest possible configuration to make sure there's nothing on the network switch side?

Hi Alex,

The OSD nodes are active/backup bond and the active Nic on each one, all goes into the same switch. The NFS gateways are currently VM’s, but again the hypervisor is using the Nic on the same switch. The cluster and public networks are vlans on the same Nic and I don’t get any alerts from monitoring/pacemaker to suggest there are comms issues. But I will look into getting some ping logs done to see if they reveal anything.

Do you check the ceph.log for any anomalies?

Yep, completely clean

Any occurrences on OSD nodes, anything in their OSD logs or syslogs?

Not that I can see. I’m using cache tiering, so all IO travels through a few OSD’s. I guess this might make it easier to try and see whats going on. But the random nature of it, means it’s not always easy to catch.

Aany odd page cache settings on the clients?

The only customizations on the clients are readahead, some TCP tunings and min free kbytes.

Alex

Thanks,
Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- 
--
Alex Gorbachev
Storcium

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com