On Wed, Oct 29, 2014 at 7:49 AM, Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > Hi! > > We are exploring options to regularly preserve (i.e. backup) the > contents of the pools backing our rados gateways. For that we create > nightly snapshots of all the relevant pools when there is no activity > on the system to get consistent states. > > In order to restore the whole pools back to a specific snapshot state, > we tried to use the rados cppool command (see below) to copy a snapshot > state into a new pool. Unfortunately this causes a segfault. Are we > doing anything wrong? > > This command: > > rados cppool --snap snap-1 deleteme.lp deleteme.lp2 2> segfault.txt > > Produces this output: > > *** Caught signal (Segmentation fault) ** in thread 7f8f49a927c0 ceph > version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: rados() > [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3: > (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17) > [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5: > (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7] > 2014-10-29 12:03:22.761653 7f8f49a927c0 -1 *** Caught signal > (Segmentation fault) ** in thread 7f8f49a927c0 > > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: > rados() [0x43eedf] 2: (()+0x10340) [0x7f8f48738340] 3: > (librados::IoCtxImpl::snap_lookup(char const*, unsigned long*)+0x17) > [0x7f8f48aff127] 4: (main()+0x1385) [0x411e75] 5: > (__libc_start_main()+0xf5) [0x7f8f4795fec5] 6: rados() [0x41c6f7] NOTE: > a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > Full segfault file and the objdump output for the rados command can be > found here: > > - https://public.centerdevice.de/53bddb80-423e-4213-ac62-59fe8dbb9bea > - https://public.centerdevice.de/50b81566-41fb-439a-b58b-e1e32d75f32a > > We updated to the 0.80.7 release (saw the issue with 0.80.5 before and > had hoped that the long list of bugfixes in the release notes would > include a fix for this) but are still seeing it. Rados gateways, OSDs, > MONs etc. have all been restarted after the update. Package versions > as follows: > > daniel.schneller@node01 [~] $ > ➜ dpkg -l | grep ceph > ii ceph 0.80.7-1trusty > ii ceph-common 0.80.7-1trusty > ii ceph-fs-common 0.80.7-1trusty > ii ceph-fuse 0.80.7-1trusty > ii ceph-mds 0.80.7-1trusty > ii libcephfs1 0.80.7-1trusty > ii python-ceph 0.80.7-1trusty > > daniel.schneller@node01 [~] $ > ➜ uname -a > Linux node01 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 > UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > > Copying without the snapshot works. Should this work at least in > theory? Well, that's interesting. I'm not sure if this can be expected to work properly, but it certainly shouldn't crash there. Looking at it a bit, you can make it not crash by specifying "-p deleteme.lp" as well, but it simply copies the current state of the pool, not the snapped state. If you could generate a ticket or two at tracker.ceph.com, that would be helpful! -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com