Assert and monitor-crash when attemting to create pool-snapshots while rbd-snapshots are in use or have been used on a pool

Oliver Daudey <oliver@xxxxxxxxx> · Sun, 18 Aug 2013 20:11:16 +0200

Hey all,

Also created on the tracker, under http://tracker.ceph.com/issues/6047

While playing around on my test-cluster, I ran into a problem that I've
seen before, but have never been able to reproduce until now.  The use
of pool-snapshots and rbd-snapshots seems to be mutually exclusive in
the same pool, even if you have used one type of snapshot before and
have since deleted all snapshots of that type.  Unfortunately, the
condition doesn't appear to be handled gracefully yet, leading, in one
case, to monitors crashing.  I think this one goes back at least as far
as Bobtail and still exists in Dumpling.  My cluster is a
straightforward one with 3 Debian Squeeze-nodes, each running a mon, mds
and osd.  To reproduce:

# ceph osd pool create test 256 256
pool 'test' created
# ceph osd pool mksnap test snapshot
created pool test snap snapshot
# ceph osd pool rmsnap test snapshot
removed pool test snap snapshot

So far, so good.  Now we try to create an rbd-snapshot in the same pool:

# rbd --pool=test create --size=102400 image
# rbd --pool=test snap create image@snapshot
rbd: failed to create snapshot: (22) Invalid argument
2013-08-18 19:27:50.892291 7f983bc10780 -1 librbd: failed to create snap
id: (22) Invalid argument

That failed, but at least the cluster is OK.  Now we start over again
and create the rbd-snapshot first:

# ceph osd pool delete test test --yes-i-really-really-mean-it
pool 'test' deleted
# ceph osd pool create test 256 256
pool 'test' created
# rbd --pool=test create --size=102400 image
# rbd --pool=test snap create image@snapshot
# rbd --pool=test snap ls image
SNAPID NAME          SIZE 
     2 snapshot 102400 MB
# rbd --pool=test snap rm image@snapshot
# ceph osd pool mksnap test snapshot
2013-08-18 19:35:59.494551 7f48d75a1700  0 monclient: hunting for new
mon
^CError EINTR:  (I pressed CTRL-C)

My leader monitor crashed at that last command, here's the apparent
critical point in the logs:

    -3> 2013-08-18 19:35:59.315956 7f9b870b1700  1 --
194.109.43.18:6789/0 <== c
lient.5856 194.109.43.18:0/1030570 8 ==== mon_command({"snap":
"snapshot", "pref
ix": "osd pool mksnap", "pool": "test"} v 0) v1 ==== 107+0+0 (1111983560
0 0) 0x23e4200 con 0x2d202c0
    -2> 2013-08-18 19:35:59.316020 7f9b870b1700  0 mon.a@0(leader) e1
handle_command mon_command({"snap": "snapshot", "prefix": "osd pool
mksnap", "pool": "test"} v 0) v1
    -1> 2013-08-18 19:35:59.316033 7f9b870b1700  1
mon.a@0(leader).paxos(paxos active c 1190049..1190629) is_readable
now=2013-08-18 19:35:59.316034 lease_expire=2013-08-18 19:36:03.535809
has v0 lc 1190629
     0> 2013-08-18 19:35:59.317612 7f9b870b1700 -1 osd/osd_types.cc: In
function 'void pg_pool_t::add_snap(const char*, utime_t)' thread
7f9b870b1700 time 2013-08-18 19:35:59.316102
osd/osd_types.cc: 682: FAILED assert(!is_unmanaged_snaps_mode())

Apart from fixing this assert and maybe giving a more clear
error-message with the failed creation of the rbd-snapshot, maybe there
should be a way to switch from one "snaps_mode" to the other without
having to delete the entire pool, if one doesn't already exist.  BTW:
How exactly does one use the pool-snapshots?  There doesn't seem to be a
documented way of listing or using them after creation.

More info available on request.

   Regards,

     Oliver

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com