active+clean+inconsistent: is an unexpected clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ceph version 10.2.2-508-g9bfc0cf (9bfc0cf178dc21b0fe33e0ce3b90a18858abaf1b)

After same time add + re-add OSD on 1 (of 3, size=3, min_size=2) nodes (by
mistake I kill second OSD vs. make "out") I got 2 active+clean+inconsistent PGs
(in RBD pool). Both on this 2 "new" OSDs (primary).

On 2 other node MD5 equal, on primary - "clone" object zero size.
Trying to remove primary object(s) in various combinations (clone only, clone &
head) - result - recreating same files.

Last - I copy & remove related RBD images, now both objects not linked to nothing.

Can I do something like removing every "clone" objects (with clone ID in name,
not "head") on every 3 replicas? Or OSD map check somewere will be unhappy?

Also sometimes OSD asserions on various places. One of bypass:
--- a/src/osd/ReplicatedPG.cc   2016-09-09 04:44:43.000000000 +0300
+++ b/src/osd/ReplicatedPG.cc   2016-09-09 04:45:10.000000000 +0300
@@ -3369,6 +3369,7 @@ ReplicatedPG::OpContextUPtr ReplicatedPG
   ObjectContextRef obc = get_object_context(coid, false, NULL);
   if (!obc) {
     derr << __func__ << "could not find coid " << coid << dendl;
+    return NULL;
     assert(0);
   }
   assert(obc->ssc);


Example of log:

# grep -F " 3.4e " /var/log/ceph/ceph.log
2016-09-09 04:03:53.098672 osd.1 10.227.227.103:6801/24502 79 : cluster [INF]
3.4e repair starts
2016-09-09 04:13:14.387856 osd.1 10.227.227.103:6801/24502 80 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368
data_digest 0xffffffff != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 04:13:14.387916 osd.1 10.227.227.103:6801/24502 81 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368 is an
unexpected clone
2016-09-09 07:14:02.269172 osd.1 10.227.227.103:6802/23450 125 : cluster [INF]
3.4e repair starts
2016-09-09 07:25:21.918189 osd.1 10.227.227.103:6802/23450 126 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368
data_digest 0xffffffff != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 07:25:21.918297 osd.1 10.227.227.103:6802/23450 127 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368 is an
unexpected clone
2016-09-09 07:27:19.679535 osd.1 10.227.227.103:6802/23450 128 : cluster [ERR]
3.4e repair 0 missing, 1 inconsistent objects
2016-09-09 07:27:19.679565 osd.1 10.227.227.103:6802/23450 129 : cluster [ERR]
3.4e repair 2 errors, 1 fixed
2016-09-09 07:27:19.692833 osd.1 10.227.227.103:6802/23450 131 : cluster [INF]
3.4e deep-scrub starts
2016-09-09 07:46:53.794432 osd.1 10.227.227.103:6802/23450 132 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368
data_digest 0xffffffff != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 07:46:53.794546 osd.1 10.227.227.103:6802/23450 133 : cluster [ERR]
deep-scrub 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368 is an
unexpected clone
2016-09-09 07:49:27.524132 osd.1 10.227.227.103:6802/23450 134 : cluster [ERR]
3.4e deep-scrub 0 missing, 1 inconsistent objects
2016-09-09 07:49:27.524140 osd.1 10.227.227.103:6802/23450 135 : cluster [ERR]
3.4e deep-scrub 2 errors
2016-09-09 13:17:01.440590 osd.1 10.227.227.103:6802/23450 168 : cluster [INF]
3.4e repair starts
2016-09-09 13:27:49.534417 osd.1 10.227.227.103:6802/23450 169 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368
data_digest 0xffffffff != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 13:27:49.534482 osd.1 10.227.227.103:6802/23450 170 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368 is an
unexpected clone
2016-09-09 13:39:44.991204 osd.0 10.227.227.104:6803/32191 130 : cluster [INF]
3.4e starting backfill to osd.7 from (0'0,0'0] MAX to 27023'4325836
2016-09-09 17:18:49.709971 osd.1 10.227.227.103:6802/5237 14 : cluster [INF]
3.4e repair starts
2016-09-09 17:23:18.244064 osd.1 10.227.227.103:6802/5237 15 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368
data_digest 0xffffffff != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 17:23:18.244116 osd.1 10.227.227.103:6802/5237 16 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.0000000000003239:2368 is an
unexpected clone
2016-09-09 17:24:26.490788 osd.1 10.227.227.103:6802/5237 17 : cluster [ERR]
3.4e repair 0 missing, 1 inconsistent objects
2016-09-09 17:24:26.490807 osd.1 10.227.227.103:6802/5237 18 : cluster [ERR]
3.4e repair 2 errors, 1 fixed

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux