Re: Kernel mounted RBD's hanging

Nick Fisk <nick@xxxxxxxxxx> · Fri, 7 Jul 2017 11:10:04 +0100

> -----Original Message-----
> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> Sent: 01 July 2017 13:19
> To: Nick Fisk <nick@xxxxxxxxxx>
> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  Kernel mounted RBD's hanging
> 
> On Sat, Jul 1, 2017 at 9:29 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> -----Original Message-----
> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> >> Sent: 30 June 2017 14:06
> >> To: Nick Fisk <nick@xxxxxxxxxx>
> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >> Subject: Re:  Kernel mounted RBD's hanging
> >>
> >> On Fri, Jun 30, 2017 at 2:14 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> >> >> Sent: 29 June 2017 18:54
> >> >> To: Nick Fisk <nick@xxxxxxxxxx>
> >> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >> >> Subject: Re:  Kernel mounted RBD's hanging
> >> >>
> >> >> On Thu, Jun 29, 2017 at 6:22 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
> >> >> >> Sent: 29 June 2017 16:58
> >> >> >> To: Nick Fisk <nick@xxxxxxxxxx>
> >> >> >> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >> >> >> Subject: Re:  Kernel mounted RBD's hanging
> >> >> >>
> >> >> >> On Thu, Jun 29, 2017 at 4:30 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> >> >> > Hi All,
> >> >> >> >
> >> >> >> > Putting out a call for help to see if anyone can shed some
> >> >> >> > light on
> >> this.
> >> >> >> >
> >> >> >> > Configuration:
> >> >> >> > Ceph cluster presenting RBD's->XFS->NFS->ESXi Running 10.2.7
> >> >> >> > on the OSD's and 4.11 kernel on the NFS gateways in a
> >> >> >> > pacemaker cluster Both OSD's and clients are go into a pair
> >> >> >> > of switches, single L2 domain (no sign from pacemaker that
> >> >> >> > there is network connectivity
> >> >> >> > issues)
> >> >> >> >
> >> >> >> > Symptoms:
> >> >> >> > - All RBD's on a single client randomly hang for 30s to
> >> >> >> > several minutes, confirmed by pacemaker and ESXi hosts
> >> >> >> > complaining
> >> >> >>
> >> >> >> Hi Nick,
> >> >> >>
> >> >> >> What is a "single client" here?
> >> >> >
> >> >> > I mean a node of the pacemaker cluster. So all RBD's on the same
> >> >> pacemaker node hang.
> >> >> >
> >> >> >>
> >> >> >> > - Cluster load is minimal when this happens most times
> >> >> >>
> >> >> >> Can you post gateway syslog and point at when this happened?
> >> >> >> Corresponding pacemaker excerpts won't hurt either.
> >> >> >
> >> >> > Jun 28 16:35:38 MS-CEPH-Proxy1 lrmd[2026]:  warning:
> >> >> > p_export_ceph-
> >> >> ds1_monitor_60000 process (PID 17754) timed out
> >> >> > Jun 28 16:35:43 MS-CEPH-Proxy1 lrmd[2026]:     crit: p_export_ceph-
> >> >> ds1_monitor_60000 process (PID 17754) will not die!
> >> >> > Jun 28 16:43:51 MS-CEPH-Proxy1 lrmd[2026]:  warning:
> >> >> > p_export_ceph-ds1_monitor_60000:17754 - timed out after 30000ms
> >> Jun
> >> >> 28 16:43:52 MS-CEPH-Proxy1 IPaddr(p_vip_ceph-ds1)[28482]: INFO:
> >> >> ifconfig
> >> >> ens224:0 down
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 lrmd[2026]:   notice: p_vip_ceph-
> >> >> ds1_stop_0:28482:stderr [ SIOCDELRT: No such process ]
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
> >> >> p_vip_ceph-ds1_stop_0: ok (node=MS-CEPH-Proxy1, call=471, rc=0,
> >> >> cib- update=318, confirmed=true)
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]:
> >> >> INFO: Un-exporting file system ...
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]:
> >> >> > INFO: unexporting 10.3.20.0/24:/mnt/Ceph-DS1 Jun 28 16:43:52
> >> >> > MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28499]: INFO:
> >> >> > Unlocked NFS
> >> >> export /mnt/Ceph-DS1 Jun 28 16:43:52 MS-CEPH-Proxy1
> >> >> exportfs(p_export_ceph-ds1)[28499]: INFO: Un-exported file
> >> >> system(s)
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
> >> >> p_export_ceph-ds1_stop_0: ok (node=MS-CEPH-Proxy1, call=473, rc=0,
> >> >> cib- update=319, confirmed=true)
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28549]:
> >> >> INFO: Exporting file system(s) ...
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 exportfs(p_export_ceph-ds1)[28549]:
> >> >> > INFO: exporting 10.3.20.0/24:/mnt/Ceph-DS1 Jun 28 16:43:52
> >> >> > MS-CEPH-
> >> >> Proxy1 exportfs(p_export_ceph-ds1)[28549]: INFO: directory
> >> >> /mnt/Ceph-DS1 exported
> >> >> > Jun 28 16:43:52 MS-CEPH-Proxy1 crmd[2029]:   notice: Operation
> >> >> p_export_ceph-ds1_start_0: ok (node=MS-CEPH-Proxy1, call=474,
> >> >> rc=0,
> >> >> cib- update=320, confirmed=true)
> >> >> >
> >> >> > If I enable the read/write checks for the FS resource, they also
> >> >> > timeout at
> >> >> the same time.
> >> >>
> >> >> What about syslog that the above corresponds to?
> >> >
> >> > I get exactly the same "_monitor" timeout message.
> >>
> >> No "libceph: " or "rbd: " messages at all?  No WARNs or hung tasks?
> >>
> >> >
> >> > Is there anything logging wise I can do with the kernel client to
> >> > log when an
> >> IO is taking a long time. Sort of like the slow requests in Ceph, but client side?
> >>
> >> Nothing out of the box, as slow requests are usually not the client
> >> implementation's fault.  Can you put together a script that would
> >> snapshot all files in
> >> /sys/kernel/debug/ceph/<cluster-fsid.client-id>/*
> >> on the gateways every second and rotate on an hourly basis?  One of
> >> those files, osdc, lists in-flight requests.  If that's empty when
> >> the timeouts occur then it's probably not krbd.
> >
> > I've managed to manually dump osdc when one of the hangs occurred:
> >
> > cat /sys/kernel/debug/ceph/d027d580-d69d-48f4-9d28-9b1650b57cce.client31526289/osdc
> > 4747768 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747770 osd75   17.c3a5d697     rbd_data.157b149238e1f29.0000000000000014               set-alloc-hint,write
> > 4747782 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747792 osd75   17.65154603     rb.0.4d983.238e1f29.000000022551                set-alloc-hint,write
> > 4747793 osd75   17.65154603     rb.0.4d983.238e1f29.000000022551                set-alloc-hint,write
> > 4747803 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747812 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747823 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747830 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747837 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> > 4747844 osd75   17.7366b517     rb.0.4d983.238e1f29.0000000b72da                set-alloc-hint,write
> >
> > So from what you are saying, this is not a krbd problem as there are pending IO's in flight?
> 
> No -- it's not empty.  Do you happen to have more samples from that particular hang?  If these same requests just sit there for minutes,
> that's definitely a ceph problem, whether krbd or cluster side.

Managed to catch another one, osd.75 again, not sure if that is an indication of anything or just a co-incidence. osd.75 is one of 8 OSD's in a cache tier, so all IO will be funnelled through them.

cat /sys/kernel/debug/ceph/d027d580-d69d-48f4-9d28-9b1650b57cce.client31443905/osdc
REQUESTS 13 homeless 0
130947221       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947226       osd75   17.4f47f0c3     [75,14,72]/75   [75,14,72]/75   rbd_data.1555406238e1f29.000000000007c8a9       0x400024        1       0'0     set-alloc-hint,write
130947231       osd75   17.a184a1cc     [75,72,3]/75    [75,72,3]/75    rbd_data.15d8670238e1f29.0000000000064054       0x400024        1       0'0     set-alloc-hint,write
130947274       osd75   17.4d83ed0c     [75,72,3]/75    [75,72,3]/75    rbd_data.1555406238e1f29.000000000007ccc1       0x400024        1       0'0     set-alloc-hint,write
130947349       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947421       osd75   17.32207383     [75,14,72]/75   [75,14,72]/75   rbd_data.15d8670238e1f29.0000000000000000       0x400024        1       0'0     set-alloc-hint,write
130947472       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947474       osd75   17.32207383     [75,14,72]/75   [75,14,72]/75   rbd_data.15d8670238e1f29.0000000000000000       0x400024        1       0'0     set-alloc-hint,write
130947689       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947740       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947783       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947826       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
130947868       osd75   17.dbb45597     [75,73,25]/75   [75,73,25]/75   rbd_data.158f204238e1f29.0000000000080171       0x400024        1       0'0     set-alloc-hint,write
LINGER REQUESTS
18446462598732840990    osd74   17.145baa0f     [74,72,14]/74   [74,72,14]/74   rbd_header.158f204238e1f29      0x20    0       WC/0
18446462598732840991    osd74   17.7b4e2a06     [74,72,25]/74   [74,72,25]/74   rbd_header.1555406238e1f29      0x20    0       WC/0
18446462598732840992    osd74   17.eea94d58     [74,73,25]/74   [74,73,25]/74   rbd_header.15d8670238e1f29      0x20    0       WC/0

Also found this in the log of osd.75 at the same time, but the client IP is not the same as the node which experienced the hang.

2017-07-05 21:11:18.653607 7f788493d700  0 bad crc in data 849850452 != exp 2621724690
2017-07-05 21:11:18.655004 7f7896fa6700  0 -- 10.3.31.15:6800/31387 submit_message osd_op_reply(125450617 rbd_data.15a5e20238e1f29.000000000023dede [set-alloc-hint object_size 4194304 wr
ite_size 4194304,write 1671168~4096] v99810'110406048 uv113132316 ondisk = 0) v7 remote, 10.3.31.43:0/914663476, failed lossy con, dropping message 0x55950d437a00
2017-07-05 21:12:21.450978 7f788493d700  0 bad crc in data 1050156366 != exp 653845871
2017-07-05 21:12:34.515905 7f788493d700  0 bad crc in data 3801395556 != exp 180104028

> 
> Thanks,
> 
>                 Ilya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com