Hi, thank you for your reply. dpkg -l ceph 0.56.2-1precise I only have 7 pools, so no, not a big deal. maybe this also has to do with it (I did some crushmap-changes and need to migrate further hosts so the OSDs can self-heal: # ceph health HEALTH_WARN 104 pgs stale; 104 pgs stuck stale; 296 pgs stuck unclean; recovery 2/4374 degraded (0.046%) frankly I can't read much from the debug-output: # rbd snap unprotect rd/debian_template@snap --debug-rbd 20 --debug-ms 1 2013-02-11 20:52:35.570797 7f89078bd780 1 -- :/0 messenger.start 2013-02-11 20:52:35.571165 7f89078bd780 1 -- :/1009032 --> 10.1.91.11:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x1a9ada0 con 0x1a9a8b0 2013-02-11 20:52:35.571601 7f89078b9700 1 -- 10.1.91.11:0/1009032 learned my addr 10.1.91.11:0/1009032 2013-02-11 20:52:35.572485 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 1 ==== mon_map v1 ==== 191+0+0 (2908705101 0 0) 0x7f88f80009f0 con 0x1a9a8b0 2013-02-11 20:52:35.572645 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 2 ==== auth_reply(proto 1 0 Success) v1 ==== 24+0+0 (3088633910 0 0) 0x7f88f8000d80 con 0x1a9a8b0 2013-02-11 20:52:35.572704 7f89029f2700 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x1a9c340 con 0x1a9a8b0 2013-02-11 20:52:35.573048 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1a9ba50 con 0x1a9a8b0 2013-02-11 20:52:35.573075 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1a9c610 con 0x1a9a8b0 2013-02-11 20:52:35.573370 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 3 ==== mon_map v1 ==== 191+0+0 (2908705101 0 0) 0x7f88f80009f0 con 0x1a9a8b0 2013-02-11 20:52:35.573425 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 4 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (1252080666 0 0) 0x7f88f8000d80 con 0 x1a9a8b0 2013-02-11 20:52:35.573730 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 5 ==== osd_map(55..55 src has 1..55) v3 ==== 12412+0+0 (1795712789 0 0) 0x7f88f8003 9b0 con 0x1a9a8b0 2013-02-11 20:52:35.574030 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (1252080666 0 0) 0x7f88f8003df0 con 0 x1a9a8b0 2013-02-11 20:52:35.574142 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 7 ==== osd_map(55..55 src has 1..55) v3 ==== 12412+0+0 (1795712789 0 0) 0x7f88f8003 9b0 con 0x1a9a8b0 2013-02-11 20:52:35.574169 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (1252080666 0 0) 0x7f88f8003df0 con 0 x1a9a8b0 2013-02-11 20:52:35.574232 7f89078bd780 20 librbd: open_image: ictx = 0x1a9cbc0 name = 'debian_template' id = '' snap_name = '' 2013-02-11 20:52:35.574327 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6809/6197 -- osd_op(client.4444.0:1 debian_template.rbd [stat] 4.13040218) v4 -- ?+0 0x1a9dff0 c on 0x1a9dcc0 2013-02-11 20:52:35.575691 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.3 10.1.91.11:6809/6197 1 ==== osd_op_reply(1 debian_template.rbd [stat] = -2 (No such file or directory )) v4 ==== 118+0+0 (3022415672 0 0) 0x7f88e00009a0 con 0x1a9dcc0 2013-02-11 20:52:35.575897 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6809/6197 -- osd_op(client.4444.0:2 rbd_id.debian_template [stat] 4.5574cf14) v4 -- ?+0 0x1a9dff 0 con 0x1a9dcc0 2013-02-11 20:52:35.576703 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.3 10.1.91.11:6809/6197 2 ==== osd_op_reply(2 rbd_id.debian_template [stat] = 0) v4 ==== 121+0+16 (19411 11872 0 2486853210) 0x7f88e0000dc0 con 0x1a9dcc0 2013-02-11 20:52:35.576788 7f89078bd780 20 librbd: detect format of debian_template : new 2013-02-11 20:52:35.576866 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6809/6197 -- osd_op(client.4444.0:3 rbd_id.debian_template [call rbd.get_id] 4.5574cf14) v4 -- ? +0 0x1a9e5b0 con 0x1a9dcc0 2013-02-11 20:52:35.577633 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.3 10.1.91.11:6809/6197 3 ==== osd_op_reply(3 rbd_id.debian_template [call] = 0) v4 ==== 121+0+16 (27681 63342 0 1805449227) 0x7f88e0000a60 con 0x1a9dcc0 2013-02-11 20:52:35.577852 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:4 rbd_header.10ed6b8b4567 [call rbd.get_size,call rbd.get_obje ct_prefix] 4.a33f8c74) v4 -- ?+0 0x1aa2010 con 0x1a9ec50 2013-02-11 20:52:35.579206 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 1 ==== osd_op_reply(4 rbd_header.10ed6b8b4567 [call,call] = 0) v4 ==== 164+0+34 (2238100216 0 1423405897) 0x7f88d8000a90 con 0x1a9ec50 2013-02-11 20:52:35.579351 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:5 rbd_header.10ed6b8b4567 [call rbd.get_stripe_unit_count] 4.a 33f8c74) v4 -- ?+0 0x1a9dff0 con 0x1a9ec50 2013-02-11 20:52:35.580352 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 2 ==== osd_op_reply(5 rbd_header.10ed6b8b4567 [call] = 0) v4 ==== 122+0+16 (4210 705 0 74706562) 0x7f88d8000a90 con 0x1a9ec50 2013-02-11 20:52:35.580427 7f89078bd780 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.10ed6b8b4567 format rbd_data.10ed6 b8b4567.%016llx 2013-02-11 20:52:35.580498 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:6 rbd_header.10ed6b8b4567 [watch 1~0] 4.a33f8c74) v4 -- ?+0 0x 1aa2960 con 0x1a9ec50 2013-02-11 20:52:35.582238 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 3 ==== osd_op_reply(6 rbd_header.10ed6b8b4567 [watch 1~0] ondisk = 0) v4 ==== 12 2+0+0 (2406224095 0 0) 0x7f88d8000a90 con 0x1a9ec50 2013-02-11 20:52:35.582346 7f89078bd780 20 librbd: ictx_refresh 0x1a9cbc0 2013-02-11 20:52:35.582437 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:7 rbd_header.10ed6b8b4567 [call rbd.get_size,call rbd.get_feat ures,call rbd.get_snapcontext,call rbd.get_parent,call lock.get_info] 4.a33f8c74) v4 -- ?+0 0x1aa26f0 con 0x1a9ec50 2013-02-11 20:52:35.583438 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 4 ==== osd_op_reply(7 rbd_header.10ed6b8b4567 [call,call,call,call,call] = 0) v4 ==== 290+0+88 (1809167792 0 3048450296) 0x7f88d8000a90 con 0x1a9ec50 2013-02-11 20:52:35.583551 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:8 rbd_header.10ed6b8b4567 [call rbd.get_snapshot_name,call rbd .get_size,call rbd.get_features,call rbd.get_parent,call rbd.get_protection_status] 4.a33f8c74) v4 -- ?+0 0x1aa26f0 con 0x1a9ec50 2013-02-11 20:52:35.584565 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 5 ==== osd_op_reply(8 rbd_header.10ed6b8b4567 [call,call,call,call,call] = 0) v4 ==== 290+0+62 (2211467652 0 1673312616) 0x7f88d8000ef0 con 0x1a9ec50 2013-02-11 20:52:35.584644 7f89078bd780 20 librbd: new snapshot id=6 name=snap size=5242880000 features=3 rbd: unprotecting snap failed: (22) Invalid argument2013-02-11 20:52:35.584724 7f89078bd780 20 librbd: snap_unprotect 0x1a9cbc0 snap 2013-02-11 20:52:35.584730 7f89078bd780 20 librbd: ictx_check 0x1a9cbc0 2013-02-11 20:52:35.584874 7f89078bd780 20 librbd: close_image 0x1a9cbc0 2013-02-11 20:52:35.584878 7f89078bd780 20 librbd: flush 0x1a9cbc0 2013-02-11 20:52:35.584880 7f89078bd780 20 librbd: ictx_check 0x1a9cbc0 2013-02-11 20:52:35.584904 7f89078bd780 1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:9 rbd_header.10ed6b8b4567 [watch 1~0] 4.a33f8c74) v4 -- ?+0 0x 1aa4480 con 0x1a9ec50 2013-02-11 20:52:35.586857 7f89029f2700 1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 6 ==== osd_op_reply(9 rbd_header.10ed6b8b4567 [watch 1~0] = 0) v4 ==== 122+0+0 ( 545292060 0 0) 0x7f88d8000ae0 con 0x1a9ec50 2013-02-11 20:52:35.587162 7f89078bd780 1 -- 10.1.91.11:0/1009032 mark_down 0x1a9ec50 -- 0x1a9ea00 2013-02-11 20:52:35.587259 7f89078bd780 1 -- 10.1.91.11:0/1009032 mark_down 0x1a9dcc0 -- 0x1a9da70 2013-02-11 20:52:35.587517 7f89078bd780 1 -- 10.1.91.11:0/1009032 mark_down_all 2013-02-11 20:52:35.587645 7f89078bd780 1 -- 10.1.91.11:0/1009032 shutdown complete. ________________________________________ Von: Josh Durgin [josh.durgin@xxxxxxxxxxx] Gesendet: Montag, 11. Februar 2013 16:40 Bis: Wolfgang Hennerbichler Cc: ceph-users@xxxxxxxx Betreff: Re: stuck snapshot On 02/11/2013 05:35 AM, Wolfgang Hennerbichler wrote: > Hi, > > My snapshot is stuck, and I don't know how that could have happened: > > root@rd-c2:~# rbd snap rm rd/debian_template@snap > rbd: snapshot 'snap' is protected from removal. > 2013-02-11 14:30:37.935986 7f7a4a4ee780 -1 librbd: removing snapshot > from header failed: (16) Device or resource busy > root@rd-c2:~# rbd snap unprotect rd/debian_template@snap > rbd: unprotecting snap failed: (22) Invalid argument Can you add '--debug-rbd 20 --debug-ms 1' to that unprotect command and share the log? Also, what version of ceph is this? > When I tried to unprotect it initially it just sat there for a long > time, then I killed the "unprotection" and that's where we are now. I > can't find a way around the problem. Do you have a large number of pools? Currently unprotect must check each pool for clones of the snapshot. > any help is really appreciated. > > Wolfgang _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com