pg inconsistent : found clone without head

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Since yesterday, scrub has detected an inconsistent pg :( :

# ceph health detail    (ceph version 0.61.9)
HEALTH_ERR 1 pgs inconsistent; 9 scrub errors
pg 3.136 is active+clean+inconsistent, acting [9,1]
9 scrub errors

# ceph pg map 3.136
osdmap e4363 pg 3.136 (3.136) -> up [9,1] acting [9,1]

But when I try to repair, osd.9 daemon failed :

# ceph pg repair 3.136
instructing pg 3.136 on osd.9 to repair

2013-11-25 10:04:09.758845 7fc2f0706700 0 log [ERR] : 3.136 osd.9 missing 96ad1336/rb.0.32a6.238e1f29.000000034d6a/5ab//3 2013-11-25 10:04:09.759862 7fc2f0706700 0 log [ERR] : repair 3.136 96ad1336/rb.0.32a6.238e1f29.000000034d6a/5ab//3 found clone without head 2013-11-25 10:04:12.872908 7fc2f0706700 0 log [ERR] : 3.136 osd.9 missing e5822336/rb.0.32a6.238e1f29.000000036552/5b3//3 2013-11-25 10:04:12.873064 7fc2f0706700 0 log [ERR] : repair 3.136 e5822336/rb.0.32a6.238e1f29.000000036552/5b3//3 found clone without head 2013-11-25 10:04:14.497750 7fc2f0706700 0 log [ERR] : 3.136 osd.9 missing 38372336/rb.0.32a6.238e1f29.000000011379/5bb//3 2013-11-25 10:04:14.497796 7fc2f0706700 0 log [ERR] : repair 3.136 38372336/rb.0.32a6.238e1f29.000000011379/5bb//3 found clone without head 2013-11-25 10:04:57.557894 7fc2f0706700 0 log [ERR] : 3.136 osd.9 missing 109b8336/rb.0.32a6.238e1f29.00000003ad6b/5ab//3 2013-11-25 10:04:57.558052 7fc2f0706700 0 log [ERR] : repair 3.136 109b8336/rb.0.32a6.238e1f29.00000003ad6b/5ab//3 found clone without head 2013-11-25 10:17:45.835145 7fc2f0706700 0 log [ERR] : 3.136 repair stat mismatch, got 8289/8292 objects, 1981/1984 clones, 26293444608/26294251520 bytes. 2013-11-25 10:17:45.835248 7fc2f0706700 0 log [ERR] : 3.136 repair 4 missing, 0 inconsistent objects 2013-11-25 10:17:45.835320 7fc2f0706700 0 log [ERR] : 3.136 repair 9 errors, 5 fixed 2013-11-25 10:17:45.839963 7fc2f0f07700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int)' thread 7fc2f0f07700 time 2013-11-25 10:17:45.836790
osd/ReplicatedPG.cc: 6643: FAILED assert(latest->is_update())


The object (found clone without head) concern the rbd images below (which is in use) :

# rbd info datashare/share3
rbd image 'share3':
	size 1024 GB in 262144 objects
	order 22 (4096 KB objects)
	block_name_prefix: rb.0.32a6.238e1f29
	format: 1


Directory contents :
In OSD.9 (Primary) :
/var/lib/ceph/osd/ceph-9/current/3.136_head/DIR_6/DIR_3/DIR_3/DIR_1# ls -l rb.0.32a6.238e1f29.000000034d6a* -rw-r--r-- 1 root root 4194304 nov. 6 02:25 rb.0.32a6.238e1f29.000000034d6a__7ed_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 8 02:40 rb.0.32a6.238e1f29.000000034d6a__7f5_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 9 02:44 rb.0.32a6.238e1f29.000000034d6a__7fd_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 12 02:52 rb.0.32a6.238e1f29.000000034d6a__815_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 14 02:39 rb.0.32a6.238e1f29.000000034d6a__825_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 16 02:45 rb.0.32a6.238e1f29.000000034d6a__835_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 19 01:59 rb.0.32a6.238e1f29.000000034d6a__84d_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 20 02:25 rb.0.32a6.238e1f29.000000034d6a__855_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 22 02:18 rb.0.32a6.238e1f29.000000034d6a__865_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 23 02:24 rb.0.32a6.238e1f29.000000034d6a__86d_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 23 02:24 rb.0.32a6.238e1f29.000000034d6a__head_96AD1336__3

In OSD.1 (Replica) :
/var/lib/ceph/osd/ceph-1/current/3.136_head/DIR_6/DIR_3/DIR_3/DIR_1# ls -l rb.0.32a6.238e1f29.000000034d6a* -rw-r--r-- 1 root root 4194304 oct. 11 17:13 rb.0.32a6.238e1f29.000000034d6a__5ab_96AD1336__3 <--- ???? -rw-r--r-- 1 root root 4194304 nov. 6 02:25 rb.0.32a6.238e1f29.000000034d6a__7ed_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 8 02:40 rb.0.32a6.238e1f29.000000034d6a__7f5_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 9 02:44 rb.0.32a6.238e1f29.000000034d6a__7fd_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 12 02:52 rb.0.32a6.238e1f29.000000034d6a__815_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 14 02:39 rb.0.32a6.238e1f29.000000034d6a__825_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 16 02:45 rb.0.32a6.238e1f29.000000034d6a__835_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 19 01:59 rb.0.32a6.238e1f29.000000034d6a__84d_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 20 02:25 rb.0.32a6.238e1f29.000000034d6a__855_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 22 02:18 rb.0.32a6.238e1f29.000000034d6a__865_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 23 02:24 rb.0.32a6.238e1f29.000000034d6a__86d_96AD1336__3 -rw-r--r-- 1 root root 4194304 nov. 23 02:24 rb.0.32a6.238e1f29.000000034d6a__head_96AD1336__3


The file rb.0.32a6.238e1f29.000000034d6a__5ab_96AD1336__3 is only present on replica on osd.1. It seems that this snapshot (5ab) no longer exists.

# ceph osd dump | grep snap
	removed_snaps [1~c,e~23]
removed_snaps [1~7,9~1,d~2,14~789,7a0~1,7a2~3,7a8~1,7aa~43,7f1~1,7f3~2,7f9~1,7fb~2,801~1,803~2,809~1,80b~2,811~1,813~2,819~1,81b~2,821~1,823~2,829~1,82b~2,831~1,833~2,839~1,83b~2,841~1,843~2,849~1,84b~2,851~1,853~2,859~1,85b~2,861~1,863~2,869~1,86b~2,871~1,873~2,879~1,87b~39,8ba~49]

# for i in `rbd snap ls datashare/share3 | cut -f3 -d ' '`; do printf '%x, ' $i; done 7ed, 7f5, 7fd, 805, 80d, 815, 81d, 825, 82d, 835, 83d, 845, 84d, 855, 85d, 865, 86d, 875, 8b4, 905


How can I be sure that this file is no more useful?

If these files are no longer used, do you think I can remove them manually in osd.1 ?

Like this ? :

$ ceph osd set noout
$ service ceph stop osd.1

$ cd /var/lib/ceph/osd/ceph-1/current/3.136_head
$ mv ./DIR_6/DIR_3/DIR_3/DIR_1/rb.0.32a6.238e1f29.000000034d6a__5ab_96AD1336__3 ./DIR_6/DIR_3/DIR_3/DIR_2/rb.0.32a6.238e1f29.000000036552__5b3_E5822336__3 ./DIR_6/DIR_3/DIR_3/DIR_2/rb.0.32a6.238e1f29.000000011379__5bb_38372336__3 ./DIR_6/DIR_3/DIR_3/DIR_8/rb.0.32a6.238e1f29.00000003ad6b__5ab_109B8336__3 /root/temp_obj_backup

$ service ceph start osd.1
$ ceph osd unset noout

$ ceph pg repair 3.136


Thanks,

Laurent Barbe

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux