Re: rbd-nbd timeout and crash

Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> · Thu, 4 Jan 2018 10:05:19 +0100

Sorry for late answer.
No - I'm not mounting with trimming, only noatime.
Problem is, that cluster was highly loaded, so there were timeouts.
I "solved" it by compiling
https://github.com/jerome-pouiller/ioctl
and set NBD_SET_TIMEOUT ioctl timeout after creating the device.

With regards
Jan Pekar

On 6.12.2017 23:58, David Turner wrote:
Do you have the FS mounted with a trimming ability?  What are your mount 
options?

On Wed, Dec 6, 2017 at 5:30 PM Jan Pekař - Imatic <jan.pekar@xxxxxxxxx 
<mailto:jan.pekar@xxxxxxxxx>> wrote:

    Hi,

    On 6.12.2017 15:24, Jason Dillaman wrote:
     > On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic
    <jan.pekar@xxxxxxxxx <mailto:jan.pekar@xxxxxxxxx>> wrote:
     >> Hi,
     >> I run to overloaded cluster (deep-scrub running) for few seconds
    and rbd-nbd
     >> client timeouted, and device become unavailable.
     >>
     >> block nbd0: Connection timed out
     >> block nbd0: shutting down sockets
     >> block nbd0: Connection timed out
     >> print_req_error: I/O error, dev nbd0, sector 2131833856
     >> print_req_error: I/O error, dev nbd0, sector 2131834112
     >>
     >> Is there any way how to extend rbd-nbd timeout?
     >
     > Support for changing the default timeout of 30 seconds is
    supported by
     > the kernel [1], but it's not currently implemented in rbd-nbd.  I
     > opened a new feature ticket for adding this option [2] but it may be
     > more constructive to figure out how to address a >30 second IO stall
     > on your cluster during deep-scrub.

    Kernel client is not supporting new image features, so I decided to use
    rbd-nbd.
    Now I tried to rm 300GB folder, which is mounted with rbd-nbd from COW
    snapshot on my healthy and almost idle cluster with only 1 deep-scrub
    running and I also hit 30s timeout and device disconnect. I'm mapping it
    from virtual server so there can be some performance issue but I'm not
    hunting performance, but stability.

    Thank you
    With regards
    Jan Pekar

     >
     >> Also getting pammed devices failed -
     >>
     >> rbd-nbd list-mapped
     >>
     >> /build/ceph-12.2.2/src/tools/rbd_nbd/rbd-nbd.cc: In function 'int
     >> get_mapped_info(int, Config*)' thread 7f069d41ec40 time 2017-12-06
     >> 09:40:33.541426
     >> /build/ceph-12.2.2/src/tools/rbd_nbd/rbd-nbd.cc: 841: FAILED
     >> assert(ifs.is_open())
     >>   ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba)
    luminous
     >> (stable)
     >>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
     >> const*)+0x102) [0x7f0693f567c2]
     >>   2: (()+0x14165) [0x559a8783d165]
     >>   3: (main()+0x9) [0x559a87838e59]
     >>   4: (__libc_start_main()+0xf1) [0x7f0691178561]
     >>   5: (()+0xff80) [0x559a87838f80]
     >>   NOTE: a copy of the executable, or `objdump -rdS <executable>`
    is needed to
     >> interpret this.
     >> Aborted
     >
     > It's been fixed in the master branch and is awaiting backport to
     > Luminous [1] -- I'd expect it to be available in v12.2.3.
     >
     >>
     >> Thank you
     >> With regards
     >> Jan Pekar
     >> _______________________________________________
     >> ceph-users mailing list
     >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >
     > [1]
    https://github.com/torvalds/linux/blob/master/drivers/block/nbd.c#L1166
     > [2] http://tracker.ceph.com/issues/22333
     > [3] http://tracker.ceph.com/issues/22185
     >
     >

    --
    ============
    Ing. Jan Pekař
    jan.pekar@xxxxxxxxx <mailto:jan.pekar@xxxxxxxxx> | +420603811737
    <tel:+420%20603%20811%20737>
    ----
    Imatic | Jagellonská 14 | Praha 3 | 130 00
    http://www.imatic.cz
    ============
    --
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com