Re: Kernel mounted RBD's hanging

Nick Fisk <nick@xxxxxxxxxx> · Sat, 1 Jul 2017 08:29:50 +0100

> -----Original Message-----
> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> Sent: 30 June 2017 14:06
> To: Nick Fisk <nick@xxxxxxxxxx>
> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  Kernel mounted RBD's hanging
> 
> On Fri, Jun 30, 2017 at 2:14 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> >> Sent: 29 June 2017 18:54
> >> To: Nick Fisk <nick@xxxxxxxxxx>
> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >> Subject: Re:  Kernel mounted RBD's hanging
> >>
> >> On Thu, Jun 29, 2017 at 6:22 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >> -----Original Message-----
> >> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> >> >> Sent: 29 June 2017 16:58
> >> >> To: Nick Fisk <nick@xxxxxxxxxx>
> >> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >> >> Subject: Re:  Kernel mounted RBD's hanging
> >> >>
> >> >> On Thu, Jun 29, 2017 at 4:30 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > Putting out a call for help to see if anyone can shed some light on
> this.
> >> >> >
> >> >> > Configuration:
> >> >> > Ceph cluster presenting RBD's->XFS->NFS->ESXi Running 10.2.7 on
> >> >> > the OSD's and 4.11 kernel on the NFS gateways in a pacemaker
> >> >> > cluster Both OSD's and clients are go into a pair of switches,
> >> >> > single L2 domain (no sign from pacemaker that there is network
> >> >> > connectivity
> >> >> > issues)
> >> >> >
> >> >> > Symptoms:
> >> >> > - All RBD's on a single client randomly hang for 30s to several
> >> >> > minutes, confirmed by pacemaker and ESXi hosts complaining
> >> >>
> >> >> Hi Nick,
> >> >>
> >> >> What is a "single client" here?
> >> >
> >> > I mean a node of the pacemaker cluster. So all RBD's on the same
> >> pacemaker node hang.
> >> >
> >> >>
> >> >> > - Cluster load is minimal when this happens most times
> >> >>
> >> >> Can you post gateway syslog and point at when this happened?
> >> >> Corresponding pacemaker excerpts won't hurt either.
> >> >
> >> > Jun 28 16:35:38 MS-CEPH-Proxy1 lrmd[2026]:  warning: p_export_ceph-
> >> ds1_monitor_60000 process (PID 17754) timed out
> >> > Jun 28 16:35:43 MS-CEPH-Proxy1 lrmd[2026]:     crit: p_export_ceph-
> >> ds1_monitor_60000 process (PID 17754) will not die!
> >> > Jun 28 16:43:51 MS-CEPH-Proxy1 lrmd[2026]:  warning:
> >> > p_export_ceph-ds1_monitor_60000:17754 - timed out after 30000ms
> Jun
> >> 28 16:43:52 MS-CEPH-Proxy1 IPaddr(p_vip_ceph-ds1)[28482]: INFO:
> >> ifconfig
> >> ens224:0 down
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 lrmd[2026]:   notice: p_vip_ceph-
> >> ds1_stop_0:28482:stderr [ SIOCDELRT: No such process ]
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
> >> p_vip_ceph-ds1_stop_0: ok (node=MS-CEPH-Proxy1, call=471, rc=0, cib-
> >> update=318, confirmed=true)
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]:
> >> INFO: Un-exporting file system ...
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]:
> >> > INFO: unexporting 10.3.20.0/24:/mnt/Ceph-DS1 Jun 28 16:43:52
> >> > MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]: INFO: Unlocked
> >> > NFS
> >> export /mnt/Ceph-DS1 Jun 28 16:43:52 MS-CEPH-Proxy1
> >> exportfs(p_export_ceph-ds1)[28499]: INFO: Un-exported file system(s)
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
> >> p_export_ceph-ds1_stop_0: ok (node=MS-CEPH-Proxy1, call=473, rc=0,
> >> cib- update=319, confirmed=true)
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28549]:
> >> INFO: Exporting file system(s) ...
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28549]:
> >> > INFO: exporting 10.3.20.0/24:/mnt/Ceph-DS1 Jun 28 16:43:52 MS-CEPH-
> >> Proxy1 exportfs(p_export_ceph-ds1)[28549]: INFO: directory
> >> /mnt/Ceph-DS1 exported
> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
> >> p_export_ceph-ds1_start_0: ok (node=MS-CEPH-Proxy1, call=474, rc=0,
> >> cib- update=320, confirmed=true)
> >> >
> >> > If I enable the read/write checks for the FS resource, they also
> >> > timeout at
> >> the same time.
> >>
> >> What about syslog that the above corresponds to?
> >
> > I get exactly the same "_monitor" timeout message.
> 
> No "libceph: " or "rbd: " messages at all?  No WARNs or hung tasks?
> 
> >
> > Is there anything logging wise I can do with the kernel client to log when an
> IO is taking a long time. Sort of like the slow requests in Ceph, but client side?
> 
> Nothing out of the box, as slow requests are usually not the client
> implementation's fault.  Can you put together a script that would snapshot
> all files in /sys/kernel/debug/ceph/<cluster-fsid.client-id>/*
> on the gateways every second and rotate on an hourly basis?  One of those
> files, osdc, lists in-flight requests.  If that's empty when the timeouts occur
> then it's probably not krbd.

I've managed to manually dump osdc when one of the hangs occurred:

cat /sys/kernel/debug/ceph/d027d580-d69d-48f4-9d28-9b1650b57cce.client31526289/osdc
4747768 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747770 osd75   17.c3a5d697     rbd_data.157b149238e1f29.0000000000000014               set-alloc-hint,write
4747782 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747792 osd75   17.65154603     rb.0.4d983.238e1f29.000000022551                set-alloc-hint,write
4747793 osd75   17.65154603     rb.0.4d983.238e1f29.000000022551                set-alloc-hint,write
4747803 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747812 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747823 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747830 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747837 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
4747844 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write

So from what you are saying, this is not a krbd problem as there are pending IO's in flight?

> 
> What Maged said, and also can you clarify what those "read/write checks for
> the FS resource" do exactly?  read/write to local xfs on /dev/rbd* or further
> up?

The FS checks uses dd to write to the filesystem and then a combination of test and cat, to read back.

> 
> Thanks,
> 
>                 Ilya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com