Re: Crash with rados cppool and snapshots

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 29 Oct 2014 19:18:48 -0700



On Wed, Oct 29, 2014 at 7:49 AM, Daniel Schneller
<daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
> Hi!
>
> We are exploring options to regularly preserve (i.e. backup) the
> contents of the pools backing our rados gateways. For that we create
> nightly snapshots of all the relevant pools when there is no activity
> on the system to get consistent states.
>
> In order to restore the whole pools back to a specific snapshot state,
> we tried to use the rados cppool command (see below) to copy a snapshot
> state into a new pool. Unfortunately this causes a segfault. Are we
> doing anything wrong?
>
> This command:
>
> rados cppool --snap snap-1 deleteme.lp deleteme.lp2 2> segfault.txt
>
> Produces this output:
>
> *** Caught signal (Segmentation fault) ** in thread 7f8f49a927c0 ceph
> version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: rados()
> [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3:
> (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17)
> [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5:
> (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7]
> 2014-10-29 12:03:22.761653 7f8f49a927c0 -1 *** Caught signal
> (Segmentation fault) ** in thread 7f8f49a927c0
>
>  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1:
>  rados() [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3:
>  (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17)
>  [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5:
>  (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7] NOTE:
>  a copy of the executable, or `objdump -rdS <executable>` is needed to
>  interpret this.
>
> Full segfault file and the objdump output for the rados command can be
> found here:
>
> - https://public.centerdevice.de/53bddb80-423e-4213-ac62-59fe8dbb9bea
> - https://public.centerdevice.de/50b81566-41fb-439a-b58b-e1e32d75f32a
>
> We updated to the 0.80.7 release (saw the issue with 0.80.5 before and
> had hoped that the long list of bugfixes in the release notes would
> include a fix for this) but are still seeing it. Rados gateways, OSDs,
> MONs etc. have all been restarted after the update. Package versions
> as follows:
>
> daniel.schneller@node01 [~] $
> ➜  dpkg -l | grep ceph
> ii  ceph                                0.80.7-1trusty
> ii  ceph-common                         0.80.7-1trusty
> ii  ceph-fs-common                      0.80.7-1trusty
> ii  ceph-fuse                           0.80.7-1trusty
> ii  ceph-mds                            0.80.7-1trusty
> ii  libcephfs1                          0.80.7-1trusty
> ii  python-ceph                         0.80.7-1trusty
>
> daniel.schneller@node01 [~] $
> ➜  uname -a
> Linux node01 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16
>    UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> Copying without the snapshot works. Should this work at least in
> theory?

Well, that's interesting. I'm not sure if this can be expected to work
properly, but it certainly shouldn't crash there. Looking at it a bit,
you can make it not crash by specifying "-p deleteme.lp" as well, but
it simply copies the current state of the pool, not the snapped state.
If you could generate a ticket or two at tracker.ceph.com, that would
be helpful!
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com