This is somewhat more likely to have been a bug in the replication logic (there were a few fixed between 0.53 and 0.55). Had there been any recent osd failures? -Sam On Mon, Dec 24, 2012 at 10:54 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Tue, 25 Dec 2012, Stefan Priebe wrote: >> Hello list, >> >> today i got the following ceph status output: >> 2012-12-25 02:57:00.632945 mon.0 [INF] pgmap v1394388: 7632 pgs: 7631 >> active+clean, 1 active+clean+inconsistent; 151 GB data, 307 GB used, 5028 GB / >> 5336 GB avail >> >> >> i then grepped the inconsistent pg by: >> # ceph pg dump - | grep inconsistent >> 3.ccf 10 0 0 0 41037824 155930 155930 >> active+clean+inconsistent 2012-12-25 01:51:35.318459 6243'2107 >> 6190'9847 [14,42] [14,42] 6243'2107 2012-12-25 01:51:35.318436 >> 6007'2074 2012-12-23 01:51:24.386366 >> >> and initiated a repair: >> # ceph pg repair 3.ccf >> instructing pg 3.ccf on osd.14 to repair >> >> The log output then was: >> 2012-12-25 02:56:59.056382 osd.14 [ERR] 3.ccf osd.42 missing >> 1c602ccf/rbd_data.4904d6b8b4567.0000000000000b84/head//3 >> 2012-12-25 02:56:59.056385 osd.14 [ERR] 3.ccf osd.42 missing >> ceb55ccf/rbd_data.48cc66b8b4567.0000000000001538/head//3 >> 2012-12-25 02:56:59.097989 osd.14 [ERR] 3.ccf osd.42 missing >> dba6bccf/rbd_data.4797d6b8b4567.00000000000015ad/head//3 >> 2012-12-25 02:56:59.097991 osd.14 [ERR] 3.ccf osd.42 missing >> a4deccf/rbd_data.45f956b8b4567.00000000000003d5/head//3 >> 2012-12-25 02:56:59.098022 osd.14 [ERR] 3.ccf repair 4 missing, 0 inconsistent >> objects >> 2012-12-25 02:56:59.098046 osd.14 [ERR] 3.ccf repair 4 errors, 4 fixed >> >> Why doesn't ceph repair this automatically? Ho could this happen at all? > > We just made some fixes to repair in next (it was broken sometime between > ~0.53 and 0.55). The latest next should repair it. In general we don't > repair automatically lest we inadvertantly propagate bad data or paper > over a bug. > > As for the original source of the missing objects... I'm not sure. There > were some fixed races related to backfill that could lead to an object > being missed, but Sam would know more about how likely that actually is. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html