Hello fellow cephalopods, every deep scrub seems to dig up inconsistencies (i.e. scrub errors) that we could use some help with diagnosing. I understand there used to be a data corruption issue before .80.3 so we made sure that all the nodes were upgraded to .80.5 and all the daemons were restarted (they all report .80.5 when contacted via socket). *After* that we ran a deep scrub, which obviously found errors, which we then repaired. But unfortunately, it's now a week later, and the next deep scrub has dug up new errors, which shouldn't have happened I think...? ceph.log shows these errors in between the deep scrub messages: 2014-09-15 07:56:23.164818 osd.15 10.10.10.55:6804/23853 364 : [ERR] 3.335 shard 2: soid 6ba68735/rbd_data.59e3c2ae8944a.00000000000006b1/head//3 digest 3090820441 != known digest 3787996302 2014-09-15 07:56:23.164827 osd.15 10.10.10.55:6804/23853 365 : [ERR] 3.335 shard 6: soid 6ba68735/rbd_data.59e3c2ae8944a.00000000000006b1/head//3 digest 3259686791 != known digest 3787996302 2014-09-15 07:56:28.485713 osd.15 10.10.10.55:6804/23853 366 : [ERR] 3.335 deep-scrub 0 missing, 1 inconsistent objects 2014-09-15 07:56:28.485734 osd.15 10.10.10.55:6804/23853 367 : [ERR] 3.335 deep-scrub 2 errors 2014-09-15 08:57:45.340968 osd.1 10.10.10.53:6800/3553 1100 : [ERR] 3.28a shard 1: soid f0d8268a/rbd_data.590142ae8944a.0000000000000699/head//3 digest 1680449797 != known digest 624976551 2014-09-15 08:57:45.340973 osd.1 10.10.10.53:6800/3553 1101 : [ERR] 3.28a shard 7: soid f0d8268a/rbd_data.590142ae8944a.0000000000000699/head//3 digest 2880845882 != known digest 624976551 2014-09-15 08:57:50.666323 osd.1 10.10.10.53:6800/3553 1102 : [ERR] 3.28a deep-scrub 0 missing, 1 inconsistent objects 2014-09-15 08:57:50.666329 osd.1 10.10.10.53:6800/3553 1103 : [ERR] 3.28a deep-scrub 2 errors Side question: why do these errors show the public facing IPs of the OSDs instead of the cluster network IPs? How much of the deep scrub traffic is taking place on the public network side of the OSDs then? Obviously we could have just repaired those as well, but getting fresh scrub errors every week isn't all that appealing which is why we left the cluster the way it is now, to be able to give out further information if needed. ceph health detail HEALTH_ERR 8 pgs inconsistent; 14 scrub errors pg 3.16c is active+clean+inconsistent, acting [6,4,1] pg 3.125 is active+clean+inconsistent, acting [1,8,3] pg 3.103 is active+clean+inconsistent, acting [8,15,4] pg 3.33 is active+clean+inconsistent, acting [3,10,8] pg 3.37e is active+clean+inconsistent, acting [10,4,15] pg 3.335 is active+clean+inconsistent, acting [15,6,2] pg 3.28a is active+clean+inconsistent, acting [1,8,7] pg 3.185 is active+clean+inconsistent, acting [6,1,4] 14 scrub errors Any input on this? Thanks in advance, Marc