Stuck with active+remapped

Max Power <maillists@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Sat, 03 Jan 2015 18:44:42 +0100

In my test environment I changed the reweights of an osd. After this
some PGs get stucked in 'active+remapped' state. I can only repair it by
stepping back to the old value of the reweight.

Here is my ceph tree:
> # id    weight  type name       up/down reweight
> -1      12      root default
> -4      12              room serverroom
> -2      12                      host test1
> 0       2                               osd.0   up      0.7439
> 1       2                               osd.1   up      0.9
> 2       4                               osd.2   up      1
> 3       4                               osd.3   up      1
> -3      0                       host test2

I changed osd.1 from 1.0 to 0.9 and then this happened:
> :# ceph health detail
> HEALTH_WARN 10 pgs stuck unclean; recovery 94/2976 objects misplaced
(3.159%)
> pg 6.4 is stuck unclean for 1135.549938, current state
active+remapped, last acting [1,2,3]
> [...]

ceph pg dump shows the primary OSD of PG 6.4 as not existent (MAXINT). I
do not have any idea what happend here.

Pool 6 is an erasure coded pool (k=2, m=1). Here is the last part of the
query output from the first PG 6.4:
> :# ceph pg 6.4 query
> [...]
> "recovery_state": [
>        { "name": "Started\/Primary\/Active",
>          "enter_time": "2015-01-03 17:16:10.054846",
>          "might_have_unfound": [],
>          "recovery_progress": { "backfill_targets": [],
>              "waiting_on_backfill": [],
>              "last_backfill_started": "0\/\/0\/\/-1",
>              "backfill_info": { "begin": "0\/\/0\/\/-1",
>                  "end": "0\/\/0\/\/-1",
>                  "objects": []},
>              "peer_backfill_info": [],
>              "backfills_in_flight": [],
>              "recovering": [],
>              "pg_backend": { "recovery_ops": [],
>                  "read_ops": []}},
>          "scrub": { "scrubber.epoch_start": "0",
>              "scrubber.active": 0,
>              "scrubber.block_writes": 0,
>              "scrubber.waiting_on": 0,
>              "scrubber.waiting_on_whom": []}},
>        { "name": "Started",
>          "enter_time": "2015-01-03 17:16:09.073069"}],

Any idea what happened or have I done anything wrong here?

Greetings!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com