On 08/18/2013 07:11 PM, Oliver Daudey wrote:
Hey all, Also created on the tracker, under http://tracker.ceph.com/issues/6047 While playing around on my test-cluster, I ran into a problem that I've seen before, but have never been able to reproduce until now. The use of pool-snapshots and rbd-snapshots seems to be mutually exclusive in the same pool, even if you have used one type of snapshot before and have since deleted all snapshots of that type. Unfortunately, the condition doesn't appear to be handled gracefully yet, leading, in one case, to monitors crashing. I think this one goes back at least as far as Bobtail and still exists in Dumpling. My cluster is a straightforward one with 3 Debian Squeeze-nodes, each running a mon, mds and osd. To reproduce: # ceph osd pool create test 256 256 pool 'test' created # ceph osd pool mksnap test snapshot created pool test snap snapshot # ceph osd pool rmsnap test snapshot removed pool test snap snapshot So far, so good. Now we try to create an rbd-snapshot in the same pool: # rbd --pool=test create --size=102400 image # rbd --pool=test snap create image@snapshot rbd: failed to create snapshot: (22) Invalid argument 2013-08-18 19:27:50.892291 7f983bc10780 -1 librbd: failed to create snap id: (22) Invalid argument That failed, but at least the cluster is OK. Now we start over again and create the rbd-snapshot first: # ceph osd pool delete test test --yes-i-really-really-mean-it pool 'test' deleted # ceph osd pool create test 256 256 pool 'test' created # rbd --pool=test create --size=102400 image # rbd --pool=test snap create image@snapshot # rbd --pool=test snap ls image SNAPID NAME SIZE 2 snapshot 102400 MB # rbd --pool=test snap rm image@snapshot # ceph osd pool mksnap test snapshot 2013-08-18 19:35:59.494551 7f48d75a1700 0 monclient: hunting for new mon ^CError EINTR: (I pressed CTRL-C)
Thanks for the steps to reproduce Oliver! Managed to reproduce this on 0.67.1 on the first attempt.
This bug appears to be the same as #5959 on the tracker. I spent some time last week looking into it, and although I realized it was far too easy to trigger it on cuttlefish, I never managed to trigger it on next -- which I attributed to d1501938f5d07c067d908501fc5cfe3c857d7281.
I'll be looking into this. -Joao
My leader monitor crashed at that last command, here's the apparent critical point in the logs: -3> 2013-08-18 19:35:59.315956 7f9b870b1700 1 -- 194.109.43.18:6789/0 <== c lient.5856 194.109.43.18:0/1030570 8 ==== mon_command({"snap": "snapshot", "pref ix": "osd pool mksnap", "pool": "test"} v 0) v1 ==== 107+0+0 (1111983560 0 0) 0x23e4200 con 0x2d202c0 -2> 2013-08-18 19:35:59.316020 7f9b870b1700 0 mon.a@0(leader) e1 handle_command mon_command({"snap": "snapshot", "prefix": "osd pool mksnap", "pool": "test"} v 0) v1 -1> 2013-08-18 19:35:59.316033 7f9b870b1700 1 mon.a@0(leader).paxos(paxos active c 1190049..1190629) is_readable now=2013-08-18 19:35:59.316034 lease_expire=2013-08-18 19:36:03.535809 has v0 lc 1190629 0> 2013-08-18 19:35:59.317612 7f9b870b1700 -1 osd/osd_types.cc: In function 'void pg_pool_t::add_snap(const char*, utime_t)' thread 7f9b870b1700 time 2013-08-18 19:35:59.316102 osd/osd_types.cc: 682: FAILED assert(!is_unmanaged_snaps_mode()) Apart from fixing this assert and maybe giving a more clear error-message with the failed creation of the rbd-snapshot, maybe there should be a way to switch from one "snaps_mode" to the other without having to delete the entire pool, if one doesn't already exist. BTW: How exactly does one use the pool-snapshots? There doesn't seem to be a documented way of listing or using them after creation. More info available on request. Regards, Oliver _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com