Cluster does not report which objects are unfound for stuck PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello people,

after a series on events and some operational mistakes, 1 PG in our cluster is in active+recovering+degraded+remapped state, reporting 1 unfound object. We're running Hammer (v0.94.9) on top of Debian Jessie, on 27 nodes and 162 osds with the default crushmap and nodeep-scrub flag set. Unfortunately, our pools on our cluster are all set up with replica size = 2 and min_size = 1.

My main problem is that ceph pg <pg> list_missing does not report which objects
are considered unfound, making it quite difficult to understand what is
happening and how to recover without doing any more damage. Specifically, the
output of the command is this:

# ceph pg 5.658 list_missing
{
    "offset": {
        "oid": "",
        "key": "",
        "snapid": 0,
        "hash": 0,
        "max": 0,
        "pool": -1,
        "namespace": ""
    },
    "num_missing": 0,
    "num_unfound": 1,
    "objects": [],
    "more": 0
}

I took a look on ceph's official docs and on older threads on this list, but on every case that I found, ceph was reporting the objects that it could not find.

Our cluster got into that state after a series of events and mistakes. I will
provide some timestamps too.
* osds of one node where down+out because of a recent failure (6 osds)
* We decided to start one osd (osd.120) to see how it will behave
* At 14:56:06 we start osd.120
* After starting osd.120, we noticed that recovery starts. As I understand now, we did not want the osd to join the cluster, so we decided to take it down again. It seems to me now that this looked like a panic move, but anyway, it
happeded.
* At 14:57:23 we shutdown osd.120.
* Some pgs that were mapped on osd.120 are reported to be down and stuck
requests targeting those osds are popping up. Of course, that meant that we
needed to start the osd again.
* At 15:02:59 we start osd.120. PGs are getting up and start peering.
* At 15:03:24, osd.33 (living on a different node) crashes with the following
assertion:

0> 2017-09-08 15:03:24.041412 7ff679fa4700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::on_local_recover(const hobject_t&, const object_stat_sum_t&, const ObjectRecoveryInfo&, ObjectContextRef, ObjectStore::Transaction*)' thread 7ff679fa4700 time 2017-09-08 15:03:24.002997
osd/ReplicatedPG.cc: 211: FAILED assert(is_primary())

* At 15:03:29 cluster reports that 1 object is unfound. We start investigating
the issue.
* After some time, we noticed that pgs mapped to osd.33 are degraded, so we decide to start osd.33 again. It seems to start normally without any issues. * After some time, recovery almost finishes, with all pgs being in a healthy
state, except pg 5.658, which should contain the unfound object.

Our cluster is now in the following state:

# ceph -s
    cluster 287f8859-9887-4bb3-ae27-531d2a1dbc95
     health HEALTH_WARN
            1 pgs degraded
            1 pgs recovering
            1 pgs stuck degraded
            1 pgs stuck unclean
            recovery 13/74653914 objects degraded (0.000%)
            recovery 300/74653914 objects misplaced (0.000%)
            recovery 1/37326882 unfound (0.000%)
            nodeep-scrub flag(s) set
monmap e1: 3 mons at {rd0-00=some_ip:6789/0,rd0-01=some_ip2:6789/0,rd0-02=some_ip3:6789/0}
            election epoch 5462, quorum 0,1,2 rd0-00,rd0-01,rd0-02
     osdmap e379262: 162 osds: 157 up, 157 in; 1 remapped pgs
            flags nodeep-scrub
pgmap v135824695: 18432 pgs, 5 pools, 98880 GB data, 36452 kobjects
            193 TB used, 89649 GB / 280 TB avail
            13/74653914 objects degraded (0.000%)
            300/74653914 objects misplaced (0.000%)
            1/37326882 unfound (0.000%)
               18430 active+clean
                   1 active+recovering+degraded+remapped
                   1 active+clean+scrubbing
  client io 9776 kB/s rd, 10937 kB/s wr, 863 op/s

# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck degraded; 1 pgs stuck unclean; recovery 13/74653918 objects degraded (0.000%); recovery 300/74653918 objects misplaced (0.000%); recovery 1/37326884 unfound (0.000%); nodeep-scrub flag(s) set pg 5.658 is stuck unclean for 541763.344743, current state active+recovering+degraded+remapped, last acting [120,155] pg 5.658 is stuck degraded for 201445.628108, current state active+recovering+degraded+remapped, last acting [120,155] pg 5.658 is active+recovering+degraded+remapped, acting [120,155], 1 unfound
recovery 13/74653918 objects degraded (0.000%)
recovery 300/74653918 objects misplaced (0.000%)
recovery 1/37326884 unfound (0.000%)
nodeep-scrub flag(s) set

# ceph pg dump_stuck unclean
ok
pg_stat state   up      up_primary      acting  acting_primary
5.658 active+recovering+degraded+remapped [120,153] 120 [120,155] 120

# ceph pg 5.658 query
Output be found here [1].

Also, we took a glance at logs but did not noticed anything strange except the crashed osd and it's error messages. Unfortunately, we did not investigate logs
further yet and did not look more into the crashed osd (osd.33).

Are there cases where a ceph cluster can report unfound objects, without even knowing which they are? Is that behavior expected or did we hit a bug? Has anyone encountered anything similar? If yes, how did you interpret the output of the command and how did you proceed in order to return the pg and the cluster
to a healthy state?

Best regards,
Nikos.

[1] https://pithos.okeanos.grnet.gr/public/fxrzW3tJYa8v7rPpcYxbF1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux