Re: stuck snapshot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This looks like a bug in unprotect. I pushed a fix to wip-snap-unprotect, but you should be able to fix it by re-protecting
and then unprotecting the snapshot.

Josh

On 02/11/2013 11:57 AM, Wolfgang Hennerbichler wrote:
Hi,

thank you for your reply.

dpkg -l ceph
0.56.2-1precise

I only have 7 pools, so no, not a big deal.

maybe this also has to do with it (I did some crushmap-changes and need to migrate further hosts so the OSDs can self-heal:
# ceph health
HEALTH_WARN 104 pgs stale; 104 pgs stuck stale; 296 pgs stuck unclean; recovery 2/4374 degraded (0.046%)

frankly I can't read much from the debug-output:

# rbd snap unprotect rd/debian_template@snap --debug-rbd 20 --debug-ms 1
2013-02-11 20:52:35.570797 7f89078bd780  1 -- :/0 messenger.start
2013-02-11 20:52:35.571165 7f89078bd780  1 -- :/1009032 --> 10.1.91.11:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x1a9ada0 con 0x1a9a8b0
2013-02-11 20:52:35.571601 7f89078b9700  1 -- 10.1.91.11:0/1009032 learned my addr 10.1.91.11:0/1009032
2013-02-11 20:52:35.572485 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 1 ==== mon_map v1 ==== 191+0+0 (2908705101 0 0) 0x7f88f80009f0 con 0x1a9a8b0
2013-02-11 20:52:35.572645 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 2 ==== auth_reply(proto 1 0 Success) v1 ==== 24+0+0 (3088633910 0 0) 0x7f88f8000d80
  con 0x1a9a8b0
2013-02-11 20:52:35.572704 7f89029f2700  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x1a9c340 con 0x1a9a8b0
2013-02-11 20:52:35.573048 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1a9ba50 con 0x1a9a8b0
2013-02-11 20:52:35.573075 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x1a9c610 con 0x1a9a8b0
2013-02-11 20:52:35.573370 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 3 ==== mon_map v1 ==== 191+0+0 (2908705101 0 0) 0x7f88f80009f0 con 0x1a9a8b0
2013-02-11 20:52:35.573425 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 4 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (1252080666 0 0) 0x7f88f8000d80 con 0
x1a9a8b0
2013-02-11 20:52:35.573730 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 5 ==== osd_map(55..55 src has 1..55) v3 ==== 12412+0+0 (1795712789 0 0) 0x7f88f8003
9b0 con 0x1a9a8b0
2013-02-11 20:52:35.574030 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (1252080666 0 0) 0x7f88f8003df0 con 0
x1a9a8b0
2013-02-11 20:52:35.574142 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 7 ==== osd_map(55..55 src has 1..55) v3 ==== 12412+0+0 (1795712789 0 0) 0x7f88f8003
9b0 con 0x1a9a8b0
2013-02-11 20:52:35.574169 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== mon.0 10.1.91.11:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (1252080666 0 0) 0x7f88f8003df0 con 0
x1a9a8b0
2013-02-11 20:52:35.574232 7f89078bd780 20 librbd: open_image: ictx = 0x1a9cbc0 name = 'debian_template' id = '' snap_name = ''
2013-02-11 20:52:35.574327 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6809/6197 -- osd_op(client.4444.0:1 debian_template.rbd [stat] 4.13040218) v4 -- ?+0 0x1a9dff0 c
on 0x1a9dcc0
2013-02-11 20:52:35.575691 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.3 10.1.91.11:6809/6197 1 ==== osd_op_reply(1 debian_template.rbd [stat] = -2 (No such file or directory
)) v4 ==== 118+0+0 (3022415672 0 0) 0x7f88e00009a0 con 0x1a9dcc0
2013-02-11 20:52:35.575897 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6809/6197 -- osd_op(client.4444.0:2 rbd_id.debian_template [stat] 4.5574cf14) v4 -- ?+0 0x1a9dff
0 con 0x1a9dcc0
2013-02-11 20:52:35.576703 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.3 10.1.91.11:6809/6197 2 ==== osd_op_reply(2 rbd_id.debian_template [stat] = 0) v4 ==== 121+0+16 (19411
11872 0 2486853210) 0x7f88e0000dc0 con 0x1a9dcc0
2013-02-11 20:52:35.576788 7f89078bd780 20 librbd: detect format of debian_template : new
2013-02-11 20:52:35.576866 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6809/6197 -- osd_op(client.4444.0:3 rbd_id.debian_template [call rbd.get_id] 4.5574cf14) v4 -- ?
+0 0x1a9e5b0 con 0x1a9dcc0
2013-02-11 20:52:35.577633 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.3 10.1.91.11:6809/6197 3 ==== osd_op_reply(3 rbd_id.debian_template [call] = 0) v4 ==== 121+0+16 (27681
63342 0 1805449227) 0x7f88e0000a60 con 0x1a9dcc0
2013-02-11 20:52:35.577852 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:4 rbd_header.10ed6b8b4567 [call rbd.get_size,call rbd.get_obje
ct_prefix] 4.a33f8c74) v4 -- ?+0 0x1aa2010 con 0x1a9ec50
2013-02-11 20:52:35.579206 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 1 ==== osd_op_reply(4 rbd_header.10ed6b8b4567 [call,call] = 0) v4 ==== 164+0+34
(2238100216 0 1423405897) 0x7f88d8000a90 con 0x1a9ec50
2013-02-11 20:52:35.579351 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:5 rbd_header.10ed6b8b4567 [call rbd.get_stripe_unit_count] 4.a
33f8c74) v4 -- ?+0 0x1a9dff0 con 0x1a9ec50
2013-02-11 20:52:35.580352 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 2 ==== osd_op_reply(5 rbd_header.10ed6b8b4567 [call] = 0) v4 ==== 122+0+16 (4210
705 0 74706562) 0x7f88d8000a90 con 0x1a9ec50
2013-02-11 20:52:35.580427 7f89078bd780 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.10ed6b8b4567 format rbd_data.10ed6
b8b4567.%016llx
2013-02-11 20:52:35.580498 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:6 rbd_header.10ed6b8b4567 [watch 1~0] 4.a33f8c74) v4 -- ?+0 0x
1aa2960 con 0x1a9ec50
2013-02-11 20:52:35.582238 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 3 ==== osd_op_reply(6 rbd_header.10ed6b8b4567 [watch 1~0] ondisk = 0) v4 ==== 12
2+0+0 (2406224095 0 0) 0x7f88d8000a90 con 0x1a9ec50
2013-02-11 20:52:35.582346 7f89078bd780 20 librbd: ictx_refresh 0x1a9cbc0
2013-02-11 20:52:35.582437 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:7 rbd_header.10ed6b8b4567 [call rbd.get_size,call rbd.get_feat
ures,call rbd.get_snapcontext,call rbd.get_parent,call lock.get_info] 4.a33f8c74) v4 -- ?+0 0x1aa26f0 con 0x1a9ec50
2013-02-11 20:52:35.583438 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 4 ==== osd_op_reply(7 rbd_header.10ed6b8b4567 [call,call,call,call,call] = 0) v4
  ==== 290+0+88 (1809167792 0 3048450296) 0x7f88d8000a90 con 0x1a9ec50
2013-02-11 20:52:35.583551 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:8 rbd_header.10ed6b8b4567 [call rbd.get_snapshot_name,call rbd
.get_size,call rbd.get_features,call rbd.get_parent,call rbd.get_protection_status] 4.a33f8c74) v4 -- ?+0 0x1aa26f0 con 0x1a9ec50
2013-02-11 20:52:35.584565 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 5 ==== osd_op_reply(8 rbd_header.10ed6b8b4567 [call,call,call,call,call] = 0) v4
  ==== 290+0+62 (2211467652 0 1673312616) 0x7f88d8000ef0 con 0x1a9ec50
2013-02-11 20:52:35.584644 7f89078bd780 20 librbd: new snapshot id=6 name=snap size=5242880000 features=3
rbd: unprotecting snap failed: (22) Invalid argument2013-02-11 20:52:35.584724 7f89078bd780 20 librbd: snap_unprotect 0x1a9cbc0 snap

2013-02-11 20:52:35.584730 7f89078bd780 20 librbd: ictx_check 0x1a9cbc0
2013-02-11 20:52:35.584874 7f89078bd780 20 librbd: close_image 0x1a9cbc0
2013-02-11 20:52:35.584878 7f89078bd780 20 librbd: flush 0x1a9cbc0
2013-02-11 20:52:35.584880 7f89078bd780 20 librbd: ictx_check 0x1a9cbc0
2013-02-11 20:52:35.584904 7f89078bd780  1 -- 10.1.91.11:0/1009032 --> 10.1.91.11:6803/5976 -- osd_op(client.4444.0:9 rbd_header.10ed6b8b4567 [watch 1~0] 4.a33f8c74) v4 -- ?+0 0x
1aa4480 con 0x1a9ec50
2013-02-11 20:52:35.586857 7f89029f2700  1 -- 10.1.91.11:0/1009032 <== osd.1 10.1.91.11:6803/5976 6 ==== osd_op_reply(9 rbd_header.10ed6b8b4567 [watch 1~0] = 0) v4 ==== 122+0+0 (
545292060 0 0) 0x7f88d8000ae0 con 0x1a9ec50
2013-02-11 20:52:35.587162 7f89078bd780  1 -- 10.1.91.11:0/1009032 mark_down 0x1a9ec50 -- 0x1a9ea00
2013-02-11 20:52:35.587259 7f89078bd780  1 -- 10.1.91.11:0/1009032 mark_down 0x1a9dcc0 -- 0x1a9da70
2013-02-11 20:52:35.587517 7f89078bd780  1 -- 10.1.91.11:0/1009032 mark_down_all
2013-02-11 20:52:35.587645 7f89078bd780  1 -- 10.1.91.11:0/1009032 shutdown complete.

________________________________________
Von: Josh Durgin [josh.durgin@xxxxxxxxxxx]
Gesendet: Montag, 11. Februar 2013 16:40
Bis: Wolfgang Hennerbichler
Cc: ceph-users@xxxxxxxx
Betreff: Re:  stuck snapshot

On 02/11/2013 05:35 AM, Wolfgang Hennerbichler wrote:
Hi,

My snapshot is stuck, and I don't know how that could have happened:

root@rd-c2:~# rbd snap rm rd/debian_template@snap
rbd: snapshot 'snap' is protected from removal.
2013-02-11 14:30:37.935986 7f7a4a4ee780 -1 librbd: removing snapshot
from header failed: (16) Device or resource busy
root@rd-c2:~# rbd snap unprotect rd/debian_template@snap
rbd: unprotecting snap failed: (22) Invalid argument

Can you add '--debug-rbd 20 --debug-ms 1' to that unprotect command
and share the log? Also, what version of ceph is this?

When I tried to unprotect it initially it just sat there for a long
time, then I killed the "unprotection" and that's where we are now. I
can't find a way around the problem.

Do you have a large number of pools? Currently unprotect must check
each pool for clones of the snapshot.

any help is really appreciated.

Wolfgang


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux