Re: 1 unfound object (but I can find it on-disk on the OSDs!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay I have now ended up returning the cluster into a healthy state but instead using the version of the object from OSDs 0 and 2 rather than OSD 1. I set the "noout" flag, and shut down OSD 1. That appears to have resulted in the cluster being happy to use the version of the object that was present on the other OSDs. Then after starting up OSD 1 again, their version was replicated back to OSD 1. So there are no more inconsistencies or unfound objects.

I had noticed that the object in question corresponded to the first 4 MB of a logical volume within the VM that was used for its root filesystem (which is BTRFS). Comparing the content to the equivalent location on disk on some other similar VMs, I started suspecting that the "extra data" in OSD 1's copy of the object was superfluous anyway. I have now restarted the VM that owns the RBD, and it was at least quite happy mounting the filesystem, so I'm hoping all is well...

Alex

On 03/05/2015 12:55 PM, Alex Moore wrote:
Hi all, I need some help getting my 0.87.1 cluster back into a healthy state...

Overnight, a deep scrub detected an inconsistent object pg. Ceph health detail said the following:

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 2.3b is active+clean+inconsistent, acting [1,2,0]
2 scrub errors

And these were the corresponding errors from the log:

2015-05-03 02:47:27.804774 6a8bc3f1e700 -1 log_channel(default) log [ERR] : 2.3b shard 1: soid c886da7b/rbd_data.25212ae8944a.0000000000000100/head//2 digest 1859582522 != known digest 2859280481, size 4194304 != known size 1642496 2015-05-03 02:47:44.099475 6a8bc3f1e700 -1 log_channel(default) log [ERR] : 2.3b deep-scrub stat mismatch, got 655/656 objects, 0/0 clones, 655/656 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 2685746176/2689940480 bytes,0/0 hit_set_archive bytes. 2015-05-03 02:47:44.099496 6a8bc3f1e700 -1 log_channel(default) log [ERR] : 2.3b deep-scrub 0 missing, 1 inconsistent objects 2015-05-03 02:47:44.099501 6a8bc3f1e700 -1 log_channel(default) log [ERR] : 2.3b deep-scrub 2 errors

I located the inconsistent object on-disk on the 3 OSDs (and have saved a copy of them). The copy on OSDs 0 and 2 match each other, and have the supposedly "known size" of 1642496. The copy on OSD 1 (the primary) has additional data appended, and a size of 4194304. The content within the portion of the file that exists on OSDs 0 and 2 is the same on OSD 1, it just has extra data as well.

As this is part of an RBD (used by a linux VM, with a filesystem on top) I reasoned that if the "extra data" on OSD 1's copy of the object is not supposed to be there, then it almost certainly maps to an unallocated part of the filesystem within the VM, and so having the extra data isn't going to do any harm. So I want to stick with the version on OSD 1 (the primary).

I then ran "ceph pg repair 2.3b", as my understanding is that should replace the copies of the object on OSDs 0 and 2 with the one from the primary OSD, achieving what I want, and removing the inconsistency. However that doesn't seem to have happened!

Instead I now have 1 unfound object (and it is the same object that had previously been reported as inconsistent), and some IO is now being blocked:

# ceph health detail
HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 osds have slow requests; recovery -1/1747956 objects degraded (-0.000%); 1/582652 unfound (0.000%) pg 2.3b is stuck unclean for 533.238307, current state active+recovering, last acting [1,2,0]
pg 2.3b is active+recovering, acting [1,2,0], 1 unfound
1 ops are blocked > 524.288 sec
1 ops are blocked > 524.288 sec on osd.1
1 osds have slow requests
recovery -1/1747956 objects degraded (-0.000%); 1/582652 unfound (0.000%)

# ceph pg 2.3b list_missing
{ "offset": { "oid": "",
      "key": "",
      "snapid": 0,
      "hash": 0,
      "max": 0,
      "pool": -1,
      "namespace": ""},
  "num_missing": 1,
  "num_unfound": 1,
  "objects": [
        { "oid": { "oid": "rbd_data.25212ae8944a.0000000000000100",
              "key": "",
              "snapid": -2,
              "hash": 3364280955,
              "max": 0,
              "pool": 2,
              "namespace": ""},
          "need": "1216'8088646",
          "have": "0'0",
          "locations": []}],
  "more": 0}

However the 3 OSDs do still have the corresponding file on-disk, with the same content that they had when I first looked at them. I can only assume that because the data in the object on the primary OSD didn't match the "known size", when I issued the "repair" Ceph somehow decided to invalidate the copy of the object on the primary OSD, rather than use it as the authoritative version, and now believes it has no good copies of the object.

How can I persuade Ceph to just go ahead and use the version of rbd_data.25212ae8944a.0000000000000100 that is already on-disk on OSD 1, and push it out to OSDs 0 and 2? Surely there is a way to do that!

Thanks in advance!
Alex
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux