The ceph-osd relies on fs barriers for correctness. You will want to remove the nobarrier option to prevent future corruption. -Sam On Mon, Dec 31, 2012 at 3:59 AM, Stefan Priebe <s.priebe@xxxxxxxxxxxx> wrote: > Am 31.12.2012 02:10, schrieb Samuel Just: > >> Are you using xfs? If so, what mount options? > > > Yes, > noatime,nodiratime,nobarrier,logbufs=8,logbsize=256k > > Stefan > >> >> On Dec 30, 2012 1:28 PM, "Stefan Priebe" <s.priebe@xxxxxxxxxxxx >> <mailto:s.priebe@xxxxxxxxxxxx>> wrote: >> > >> > Am 30.12.2012 19:17, schrieb Samuel Just: >> >> >> >> This is somewhat more likely to have been a bug in the replication >> logic >> >> (there were a few fixed between 0.53 and 0.55). Had there been any >> >> recent osd failures? >> > >> > Yes i was stressing CEPH with failures (power, link, disk, ...). >> > >> > Stefan >> > >> >> On Dec 24, 2012 10:55 PM, "Sage Weil" <sage@xxxxxxxxxxx >> <mailto:sage@xxxxxxxxxxx> >> >> <mailto:sage@xxxxxxxxxxx <mailto:sage@xxxxxxxxxxx>>> wrote: >> >> >> >> On Tue, 25 Dec 2012, Stefan Priebe wrote: >> >> > Hello list, >> >> > >> >> > today i got the following ceph status output: >> >> > 2012-12-25 02:57:00.632945 mon.0 [INF] pgmap v1394388: 7632 >> pgs: 7631 >> >> > active+clean, 1 active+clean+inconsistent; 151 GB data, 307 GB >> >> used, 5028 GB / >> >> > 5336 GB avail >> >> > >> >> > >> >> > i then grepped the inconsistent pg by: >> >> > # ceph pg dump - | grep inconsistent >> >> > 3.ccf 10 0 0 0 41037824 155930 >> >> 155930 >> >> > active+clean+inconsistent 2012-12-25 01:51:35.318459 >> 6243'2107 >> >> > 6190'9847 [14,42] [14,42] 6243'2107 2012-12-25 >> >> 01:51:35.318436 >> >> > 6007'2074 2012-12-23 01:51:24.386366 >> >> > >> >> > and initiated a repair: >> >> > # ceph pg repair 3.ccf >> >> > instructing pg 3.ccf on osd.14 to repair >> >> > >> >> > The log output then was: >> >> > 2012-12-25 02:56:59.056382 osd.14 [ERR] 3.ccf osd.42 missing >> >> > 1c602ccf/rbd_data.4904d6b8b4567.0000000000000b84/head//3 >> >> > 2012-12-25 02:56:59.056385 osd.14 [ERR] 3.ccf osd.42 missing >> >> > ceb55ccf/rbd_data.48cc66b8b4567.0000000000001538/head//3 >> >> > 2012-12-25 02:56:59.097989 osd.14 [ERR] 3.ccf osd.42 missing >> >> > dba6bccf/rbd_data.4797d6b8b4567.00000000000015ad/head//3 >> >> > 2012-12-25 02:56:59.097991 osd.14 [ERR] 3.ccf osd.42 missing >> >> > a4deccf/rbd_data.45f956b8b4567.00000000000003d5/head//3 >> >> > 2012-12-25 02:56:59.098022 osd.14 [ERR] 3.ccf repair 4 missing, >> 0 >> >> inconsistent >> >> > objects >> >> > 2012-12-25 02:56:59.098046 osd.14 [ERR] 3.ccf repair 4 errors, >> 4 >> >> fixed >> >> > >> >> > Why doesn't ceph repair this automatically? Ho could this >> happen >> >> at all? >> >> >> >> We just made some fixes to repair in next (it was broken sometime >> >> between >> >> ~0.53 and 0.55). The latest next should repair it. In general >> we don't >> >> repair automatically lest we inadvertantly propagate bad data or >> paper >> >> over a bug. >> >> >> >> As for the original source of the missing objects... I'm not sure. >> >> There >> >> were some fixed races related to backfill that could lead to an >> object >> >> being missed, but Sam would know more about how likely that >> actually is. >> >> >> >> sage >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" in >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> <mailto:majordomo@xxxxxxxxxxxxxxx> >> >> <mailto:majordomo@xxxxxxxxxxxxxxx >> >> <mailto:majordomo@xxxxxxxxxxxxxxx>> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html