Re: reproducible rbd-nbd crashes

Jason Dillaman <jdillama@xxxxxxxxxx> · Wed, 31 Jul 2019 08:43:10 -0400

On Wed, Jul 31, 2019 at 6:20 AM Marc Schöchlin <ms@xxxxxxxxxx> wrote:
>
> Hello Jason,
>
> it seems that there is something wrong in the rbd-nbd implementation.
> (added this information also at  https://tracker.ceph.com/issues/40822)
>
> The problem not seems to be related to kernel releases, filesystem types or the ceph and network setup.
> Release 12.2.5 seems to work properly, and at least releases >= 12.2.10 seems to have the described problem.

Here is the complete delta between the two releases in rbd-nbd:

$ git diff v12.2.5..v12.2.12 -- .

diff --git a/src/tools/rbd_nbd/rbd-nbd.cc b/src/tools/rbd_nbd/rbd-nbd.cc
index 098d9925ca..aefdbd36e0 100644
--- a/src/tools/rbd_nbd/rbd-nbd.cc
+++ b/src/tools/rbd_nbd/rbd-nbd.cc
@@ -595,14 +595,13 @@ static int do_map(int argc, const char *argv[],
Config *cfg)
       cerr << err << std::endl;
       return r;
     }
-
     if (forker.is_parent()) {
-      global_init_postfork_start(g_ceph_context);
       if (forker.parent_wait(err) != 0) {
         return -ENXIO;
       }
       return 0;
     }
+    global_init_postfork_start(g_ceph_context);
   }

   common_init_finish(g_ceph_context);
@@ -724,8 +723,8 @@ static int do_map(int argc, const char *argv[], Config *cfg)

   if (info.size > ULONG_MAX) {
     r = -EFBIG;
-    cerr << "rbd-nbd: image is too large (" << prettybyte_t(info.size)
-         << ", max is " << prettybyte_t(ULONG_MAX) << ")" << std::endl;
+    cerr << "rbd-nbd: image is too large (" << byte_u_t(info.size)
+         << ", max is " << byte_u_t(ULONG_MAX) << ")" << std::endl;
     goto close_nbd;
   }

@@ -761,9 +760,8 @@ static int do_map(int argc, const char *argv[], Config *cfg)
     cout << cfg->devpath << std::endl;

     if (g_conf->daemonize) {
-      forker.daemonize();
-      global_init_postfork_start(g_ceph_context);
       global_init_postfork_finish(g_ceph_context);
+      forker.daemonize();
     }

     {

It's basically just a log message tweak and some changes to how the
process is daemonized. If you could re-test w/ each release after
12.2.5 and pin-point where the issue starts occurring, we would have
something more to investigate.

> This night a 18 hour testrun with the following procedure was successful:
> -----
> #!/bin/bash
> set -x
> while true; do
>    date
>    find /srv_ec -type f -name "*.MYD" -print0 |head -n 50|xargs -0 -P 10 -n 2 gzip -v
>    date
>    find /srv_ec -type f -name "*.MYD.gz" -print0 |head -n 50|xargs -0 -P 10 -n 2 gunzip -v
> done
> -----
> Previous tests crashed in a reproducible manner with "-P 1" (single io gzip/gunzip) after a few minutes up to 45 minutes.
>
> Overview of my tests:
>
> - SUCCESSFUL: kernel 4.15, ceph 12.2.5, 1TB ec-volume, ext4 file system, 120s device timeout
>   -> 18 hour testrun was successful, no dmesg output
> - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout
>   -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created without reboot
>   -> parallel krbd device usage with 99% io usage worked without a problem while running the test
> - FAILED: kernel 4.15, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout
>   -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created
>   -> parallel krbd device usage with 99% io usage worked without a problem while running the test
> - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, no timeout
>   -> failed after < 10 minutes
>   -> system runs in a high system load, system is almost unusable, unable to shutdown the system, hard reset of vm necessary, manual exclusive lock removal is necessary before remapping the device
> - FAILED: kernel 4.4, ceph 12.2.11, 2TB 3-replica-volume, xfs file system, 120s device timeout
>   -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created
>
> All device timeouts were set separately set by the nbd_set_ioctl tool because luminous rbd-nbd does not provide the possibility to define timeouts.
>
> Whats next? Is i a good idea to do a binary search between 12.2.12 and 12.2.5?
>
> From my point of view (without in depth-knowledge of rbd-nbd/librbd) my assumption is that this problem might be caused by rbd-nbd code and not by librbd.
> The probability that a bug like this survives uncovered in librbd for such a long time seems to be low for me :-)
>
> Regards
> Marc
>
> Am 29.07.19 um 22:25 schrieb Marc Schöchlin:
> > Hello Jason,
> >
> > i updated the ticket https://tracker.ceph.com/issues/40822
> >
> > Am 24.07.19 um 19:20 schrieb Jason Dillaman:
> >> On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin <ms@xxxxxxxxxx> wrote:
> >>> Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, because the ceph apt source does not contain that version.
> >>> Do you know a package source?
> >> All the upstream packages should be available here [1], including 12.2.5.
> > Ah okay, i will test this tommorow.
> >> Did you pull the OSD blocked ops stats to figure out what is going on
> >> with the OSDs?
> > Yes, see referenced data in the ticket https://tracker.ceph.com/issues/40822#note-15
> >
> > Regards
> > Marc
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com