Re: Kernel mounted RBD's hanging

Ilya Dryomov <idryomov@xxxxxxxxx> · Sat, 1 Jul 2017 14:18:44 +0200

On Sat, Jul 1, 2017 at 9:29 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> -----Original Message-----
>> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
>> Sent: 30 June 2017 14:06
>> To: Nick Fisk <nick@xxxxxxxxxx>
>> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  Kernel mounted RBD's hanging
>>
>> On Fri, Jun 30, 2017 at 2:14 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
>> >> Sent: 29 June 2017 18:54
>> >> To: Nick Fisk <nick@xxxxxxxxxx>
>> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
>> >> Subject: Re:  Kernel mounted RBD's hanging
>> >>
>> >> On Thu, Jun 29, 2017 at 6:22 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >> >> -----Original Message-----
>> >> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
>> >> >> Sent: 29 June 2017 16:58
>> >> >> To: Nick Fisk <nick@xxxxxxxxxx>
>> >> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
>> >> >> Subject: Re:  Kernel mounted RBD's hanging
>> >> >>
>> >> >> On Thu, Jun 29, 2017 at 4:30 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > Putting out a call for help to see if anyone can shed some light on
>> this.
>> >> >> >
>> >> >> > Configuration:
>> >> >> > Ceph cluster presenting RBD's->XFS->NFS->ESXi Running 10.2.7 on
>> >> >> > the OSD's and 4.11 kernel on the NFS gateways in a pacemaker
>> >> >> > cluster Both OSD's and clients are go into a pair of switches,
>> >> >> > single L2 domain (no sign from pacemaker that there is network
>> >> >> > connectivity
>> >> >> > issues)
>> >> >> >
>> >> >> > Symptoms:
>> >> >> > - All RBD's on a single client randomly hang for 30s to several
>> >> >> > minutes, confirmed by pacemaker and ESXi hosts complaining
>> >> >>
>> >> >> Hi Nick,
>> >> >>
>> >> >> What is a "single client" here?
>> >> >
>> >> > I mean a node of the pacemaker cluster. So all RBD's on the same
>> >> pacemaker node hang.
>> >> >
>> >> >>
>> >> >> > - Cluster load is minimal when this happens most times
>> >> >>
>> >> >> Can you post gateway syslog and point at when this happened?
>> >> >> Corresponding pacemaker excerpts won't hurt either.
>> >> >
>> >> > Jun 28 16:35:38 MS-CEPH-Proxy1 lrmd[2026]:  warning: p_export_ceph-
>> >> ds1_monitor_60000 process (PID 17754) timed out
>> >> > Jun 28 16:35:43 MS-CEPH-Proxy1 lrmd[2026]:     crit: p_export_ceph-
>> >> ds1_monitor_60000 process (PID 17754) will not die!
>> >> > Jun 28 16:43:51 MS-CEPH-Proxy1 lrmd[2026]:  warning:
>> >> > p_export_ceph-ds1_monitor_60000:17754 - timed out after 30000ms
>> Jun
>> >> 28 16:43:52 MS-CEPH-Proxy1 IPaddr(p_vip_ceph-ds1)[28482]: INFO:
>> >> ifconfig
>> >> ens224:0 down
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 lrmd[2026]:   notice: p_vip_ceph-
>> >> ds1_stop_0:28482:stderr [ SIOCDELRT: No such process ]
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
>> >> p_vip_ceph-ds1_stop_0: ok (node=MS-CEPH-Proxy1, call=471, rc=0, cib-
>> >> update=318, confirmed=true)
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]:
>> >> INFO: Un-exporting file system ...
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]:
>> >> > INFO: unexporting 10.3.20.0/24:/mnt/Ceph-DS1 Jun 28 16:43:52
>> >> > MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]: INFO: Unlocked
>> >> > NFS
>> >> export /mnt/Ceph-DS1 Jun 28 16:43:52 MS-CEPH-Proxy1
>> >> exportfs(p_export_ceph-ds1)[28499]: INFO: Un-exported file system(s)
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
>> >> p_export_ceph-ds1_stop_0: ok (node=MS-CEPH-Proxy1, call=473, rc=0,
>> >> cib- update=319, confirmed=true)
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28549]:
>> >> INFO: Exporting file system(s) ...
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28549]:
>> >> > INFO: exporting 10.3.20.0/24:/mnt/Ceph-DS1 Jun 28 16:43:52 MS-CEPH-
>> >> Proxy1 exportfs(p_export_ceph-ds1)[28549]: INFO: directory
>> >> /mnt/Ceph-DS1 exported
>> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
>> >> p_export_ceph-ds1_start_0: ok (node=MS-CEPH-Proxy1, call=474, rc=0,
>> >> cib- update=320, confirmed=true)
>> >> >
>> >> > If I enable the read/write checks for the FS resource, they also
>> >> > timeout at
>> >> the same time.
>> >>
>> >> What about syslog that the above corresponds to?
>> >
>> > I get exactly the same "_monitor" timeout message.
>>
>> No "libceph: " or "rbd: " messages at all?  No WARNs or hung tasks?
>>
>> >
>> > Is there anything logging wise I can do with the kernel client to log when an
>> IO is taking a long time. Sort of like the slow requests in Ceph, but client side?
>>
>> Nothing out of the box, as slow requests are usually not the client
>> implementation's fault.  Can you put together a script that would snapshot
>> all files in /sys/kernel/debug/ceph/<cluster-fsid.client-id>/*
>> on the gateways every second and rotate on an hourly basis?  One of those
>> files, osdc, lists in-flight requests.  If that's empty when the timeouts occur
>> then it's probably not krbd.
>
> I've managed to manually dump osdc when one of the hangs occurred:
>
> cat /sys/kernel/debug/ceph/d027d580-d69d-48f4-9d28-9b1650b57cce.client31526289/osdc
> 4747768 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747770 osd75   17.c3a5d697     rbd_data.157b149238e1f29.0000000000000014               set-alloc-hint,write
> 4747782 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747792 osd75   17.65154603     rb.0.4d983.238e1f29.000000022551                set-alloc-hint,write
> 4747793 osd75   17.65154603     rb.0.4d983.238e1f29.000000022551                set-alloc-hint,write
> 4747803 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747812 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747823 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747830 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747837 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> 4747844 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
>
> So from what you are saying, this is not a krbd problem as there are pending IO's in flight?

No -- it's not empty.  Do you happen to have more samples from that
particular hang?  If these same requests just sit there for minutes,
that's definitely a ceph problem, whether krbd or cluster side.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com