Re: automatic repair of inconsistent pg?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 30.12.2012 19:17, schrieb Samuel Just:
This is somewhat more likely to have been a bug in the replication logic
(there were a few fixed between 0.53 and 0.55).  Had there been any
recent osd failures?

Yes i was stressing CEPH with failures (power, link, disk, ...).

Stefan

On Dec 24, 2012 10:55 PM, "Sage Weil" <sage@xxxxxxxxxxx
<mailto:sage@xxxxxxxxxxx>> wrote:

    On Tue, 25 Dec 2012, Stefan Priebe wrote:
     > Hello list,
     >
     > today i got the following ceph status output:
     > 2012-12-25 02:57:00.632945 mon.0 [INF] pgmap v1394388: 7632 pgs: 7631
     > active+clean, 1 active+clean+inconsistent; 151 GB data, 307 GB
    used, 5028 GB /
     > 5336 GB avail
     >
     >
     > i then grepped the inconsistent pg by:
     > # ceph pg dump - | grep inconsistent
     > 3.ccf   10      0       0       0       41037824        155930
      155930
     > active+clean+inconsistent       2012-12-25 01:51:35.318459 6243'2107
     > 6190'9847       [14,42] [14,42] 6243'2107       2012-12-25
    01:51:35.318436
     > 6007'2074       2012-12-23 01:51:24.386366
     >
     > and initiated a repair:
     > #  ceph pg repair 3.ccf
     > instructing pg 3.ccf on osd.14 to repair
     >
     > The log output then was:
     > 2012-12-25 02:56:59.056382 osd.14 [ERR] 3.ccf osd.42 missing
     > 1c602ccf/rbd_data.4904d6b8b4567.0000000000000b84/head//3
     > 2012-12-25 02:56:59.056385 osd.14 [ERR] 3.ccf osd.42 missing
     > ceb55ccf/rbd_data.48cc66b8b4567.0000000000001538/head//3
     > 2012-12-25 02:56:59.097989 osd.14 [ERR] 3.ccf osd.42 missing
     > dba6bccf/rbd_data.4797d6b8b4567.00000000000015ad/head//3
     > 2012-12-25 02:56:59.097991 osd.14 [ERR] 3.ccf osd.42 missing
     > a4deccf/rbd_data.45f956b8b4567.00000000000003d5/head//3
     > 2012-12-25 02:56:59.098022 osd.14 [ERR] 3.ccf repair 4 missing, 0
    inconsistent
     > objects
     > 2012-12-25 02:56:59.098046 osd.14 [ERR] 3.ccf repair 4 errors, 4
    fixed
     >
     > Why doesn't ceph repair this automatically? Ho could this happen
    at all?

    We just made some fixes to repair in next (it was broken sometime
    between
    ~0.53 and 0.55).  The latest next should repair it.  In general we don't
    repair automatically lest we inadvertantly propagate bad data or paper
    over a bug.

    As for the original source of the missing objects... I'm not sure.
      There
    were some fixed races related to backfill that could lead to an object
    being missed, but Sam would know more about how likely that actually is.

    sage
    --
    To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
    the body of a message to majordomo@xxxxxxxxxxxxxxx
    <mailto:majordomo@xxxxxxxxxxxxxxx>
    More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux