Cluster does not report which objects are unfound for stuck PG

Nikos Kormpakis <nkorb@xxxxxxxxxxxx> · Sun, 10 Sep 2017 23:23:17 +0300

Hello people,

after a series on events and some operational mistakes, 1 PG in our 
cluster is
in active+recovering+degraded+remapped state, reporting 1 unfound 
object.
We're running Hammer (v0.94.9) on top of Debian Jessie, on 27 nodes and 
162
osds with the default crushmap and nodeep-scrub flag set. Unfortunately, 
our
pools on our cluster are all set up with replica size = 2 and min_size = 
1.

My main problem is that ceph pg <pg> list_missing does not report which 
objects
are considered unfound, making it quite difficult to understand what is
happening and how to recover without doing any more damage. 
Specifically, the
output of the command is this:

# ceph pg 5.658 list_missing
{
    "offset": {
        "oid": "",
        "key": "",
        "snapid": 0,
        "hash": 0,
        "max": 0,
        "pool": -1,
        "namespace": ""
    },
    "num_missing": 0,
    "num_unfound": 1,
    "objects": [],
    "more": 0
}

I took a look on ceph's official docs and on older threads on this list, 
but on
every case that I found, ceph was reporting the objects that it could 
not find.

Our cluster got into that state after a series of events and mistakes. I 
will
provide some timestamps too.
* osds of one node where down+out because of a recent failure (6 osds)
* We decided to start one osd (osd.120) to see how it will behave
* At 14:56:06 we start osd.120
* After starting osd.120, we noticed that recovery starts. As I 
understand now,
we did not want the osd to join the cluster, so we decided to take it 
down
again. It seems to me now that this looked like a panic move, but 
anyway, it
happeded.
* At 14:57:23 we shutdown osd.120.
* Some pgs that were mapped on osd.120 are reported to be down and stuck
requests targeting those osds are popping up. Of course, that meant that 
we
needed to start the osd again.
* At 15:02:59 we start osd.120. PGs are getting up and start peering.
* At 15:03:24, osd.33 (living on a different node) crashes with the 
following
assertion:

0> 2017-09-08 15:03:24.041412 7ff679fa4700 -1 osd/ReplicatedPG.cc: In 
function 'virtual void ReplicatedPG::on_local_recover(const hobject_t&, 
const object_stat_sum_t&, const ObjectRecoveryInfo&, ObjectContextRef, 
ObjectStore::Transaction*)' thread 7ff679fa4700 time 2017-09-08 
15:03:24.002997
osd/ReplicatedPG.cc: 211: FAILED assert(is_primary())

* At 15:03:29 cluster reports that 1 object is unfound. We start 
investigating
the issue.
* After some time, we noticed that pgs mapped to osd.33 are degraded, so 
we
decide to start osd.33 again. It seems to start normally without any 
issues.
* After some time, recovery almost finishes, with all pgs being in a 
healthy
state, except pg 5.658, which should contain the unfound object.

Our cluster is now in the following state:

# ceph -s
    cluster 287f8859-9887-4bb3-ae27-531d2a1dbc95
     health HEALTH_WARN
            1 pgs degraded
            1 pgs recovering
            1 pgs stuck degraded
            1 pgs stuck unclean
            recovery 13/74653914 objects degraded (0.000%)
            recovery 300/74653914 objects misplaced (0.000%)
            recovery 1/37326882 unfound (0.000%)
            nodeep-scrub flag(s) set
     monmap e1: 3 mons at 
{rd0-00=some_ip:6789/0,rd0-01=some_ip2:6789/0,rd0-02=some_ip3:6789/0}
            election epoch 5462, quorum 0,1,2 rd0-00,rd0-01,rd0-02
     osdmap e379262: 162 osds: 157 up, 157 in; 1 remapped pgs
            flags nodeep-scrub
      pgmap v135824695: 18432 pgs, 5 pools, 98880 GB data, 36452 
kobjects
            193 TB used, 89649 GB / 280 TB avail
            13/74653914 objects degraded (0.000%)
            300/74653914 objects misplaced (0.000%)
            1/37326882 unfound (0.000%)
               18430 active+clean
                   1 active+recovering+degraded+remapped
                   1 active+clean+scrubbing
  client io 9776 kB/s rd, 10937 kB/s wr, 863 op/s

# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck degraded; 1 
pgs stuck unclean; recovery 13/74653918 objects degraded (0.000%); 
recovery 300/74653918 objects misplaced (0.000%); recovery 1/37326884 
unfound (0.000%); nodeep-scrub flag(s) set
pg 5.658 is stuck unclean for 541763.344743, current state 
active+recovering+degraded+remapped, last acting [120,155]
pg 5.658 is stuck degraded for 201445.628108, current state 
active+recovering+degraded+remapped, last acting [120,155]
pg 5.658 is active+recovering+degraded+remapped, acting [120,155], 1 
unfound
recovery 13/74653918 objects degraded (0.000%)
recovery 300/74653918 objects misplaced (0.000%)
recovery 1/37326884 unfound (0.000%)
nodeep-scrub flag(s) set

# ceph pg dump_stuck unclean
ok
pg_stat state   up      up_primary      acting  acting_primary
5.658   active+recovering+degraded+remapped     [120,153]       120     
[120,155]       120

# ceph pg 5.658 query
Output be found here [1].

Also, we took a glance at logs but did not noticed anything strange 
except the
crashed osd and it's error messages. Unfortunately, we did not 
investigate logs
further yet and did not look more into the crashed osd (osd.33).

Are there cases where a ceph cluster can report unfound objects, without 
even
knowing which they are? Is that behavior expected or did we hit a bug? 
Has
anyone encountered anything similar? If yes, how did you interpret the 
output
of the command and how did you proceed in order to return the pg and the 
cluster
to a healthy state?

Best regards,
Nikos.

[1] https://pithos.okeanos.grnet.gr/public/fxrzW3tJYa8v7rPpcYxbF1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com