Re: rbd-nbd timeout and crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 6.12.2017 15:24, Jason Dillaman wrote:
On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> wrote:
Hi,
I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd
client timeouted, and device become unavailable.

block nbd0: Connection timed out
block nbd0: shutting down sockets
block nbd0: Connection timed out
print_req_error: I/O error, dev nbd0, sector 2131833856
print_req_error: I/O error, dev nbd0, sector 2131834112

Is there any way how to extend rbd-nbd timeout?

Support for changing the default timeout of 30 seconds is supported by
the kernel [1], but it's not currently implemented in rbd-nbd.  I
opened a new feature ticket for adding this option [2] but it may be
more constructive to figure out how to address a >30 second IO stall
on your cluster during deep-scrub.

Kernel client is not supporting new image features, so I decided to use rbd-nbd. Now I tried to rm 300GB folder, which is mounted with rbd-nbd from COW snapshot on my healthy and almost idle cluster with only 1 deep-scrub running and I also hit 30s timeout and device disconnect. I'm mapping it from virtual server so there can be some performance issue but I'm not hunting performance, but stability.

Thank you
With regards
Jan Pekar


Also getting pammed devices failed -

rbd-nbd list-mapped

/build/ceph-12.2.2/src/tools/rbd_nbd/rbd-nbd.cc: In function 'int
get_mapped_info(int, Config*)' thread 7f069d41ec40 time 2017-12-06
09:40:33.541426
/build/ceph-12.2.2/src/tools/rbd_nbd/rbd-nbd.cc: 841: FAILED
assert(ifs.is_open())
  ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous
(stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7f0693f567c2]
  2: (()+0x14165) [0x559a8783d165]
  3: (main()+0x9) [0x559a87838e59]
  4: (__libc_start_main()+0xf1) [0x7f0691178561]
  5: (()+0xff80) [0x559a87838f80]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
Aborted

It's been fixed in the master branch and is awaiting backport to
Luminous [1] -- I'd expect it to be available in v12.2.3.


Thank you
With regards
Jan Pekar
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://github.com/torvalds/linux/blob/master/drivers/block/nbd.c#L1166
[2] http://tracker.ceph.com/issues/22333
[3] http://tracker.ceph.com/issues/22185



--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux