Re: PGs stuck active+remapped and osds lose data?!

Shinobu Kinjo <skinjo@xxxxxxxxxx> · Tue, 10 Jan 2017 16:06:32 +0900

e.g.,
OSD7 / 3 / 0 are in the same acting set. They should be up, if they
are properly running.

# 9.7
 <snip>
>    "up": [
>        7,
>        3
>    ],
>    "acting": [
>        7,
>        3,
>        0
>    ],
 <snip>

Here is an example:

  "up": [
    1,
    0,
    2
  ],
  "acting": [
    1,
    0,
    2
   ],

Regards,

On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
>>
>> That's not perfectly correct.
>>
>> OSD.0/1/2 seem to be down.
>
>
> Sorry but where do you see this? I think this indicates that they are up:   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
>
>
>> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
>>
>> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
>>> All osds are currently up:
>>>
>>>     health HEALTH_WARN
>>>            4 pgs stuck unclean
>>>            recovery 4482/58798254 objects degraded (0.008%)
>>>            recovery 420522/58798254 objects misplaced (0.715%)
>>>            noscrub,nodeep-scrub flag(s) set
>>>     monmap e9: 5 mons at
>>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>>>            election epoch 478, quorum 0,1,2,3,4
>>> ceph1,ceph2,ceph3,ceph4,ceph5
>>>     osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>            flags noscrub,nodeep-scrub
>>>      pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>>>            15070 GB used, 40801 GB / 55872 GB avail
>>>            4482/58798254 objects degraded (0.008%)
>>>            420522/58798254 objects misplaced (0.715%)
>>>                 316 active+clean
>>>                   4 active+remapped
>>>  client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>>>
>>> This did not chance for two days or so.
>>>
>>>
>>> By the way, my ceph osd df now looks like this:
>>>
>>> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR
>>> 0 1.28899  1.00000  3724G  1699G  2024G 45.63 1.69
>>> 1 1.57899  1.00000  3724G  1708G  2015G 45.87 1.70
>>> 2 1.68900  1.00000  3724G  1695G  2028G 45.54 1.69
>>> 3 6.78499  1.00000  7450G  1241G  6208G 16.67 0.62
>>> 4 8.39999  1.00000  7450G  1228G  6221G 16.49 0.61
>>> 5 9.51500  1.00000  7450G  1239G  6210G 16.64 0.62
>>> 6 7.66499  1.00000  7450G  1265G  6184G 16.99 0.63
>>> 7 9.75499  1.00000  7450G  2497G  4952G 33.52 1.24
>>> 8 9.32999  1.00000  7450G  2495G  4954G 33.49 1.24
>>>              TOTAL 55872G 15071G 40801G 26.97
>>> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>>>
>>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
>>> also think this is no problem and ceph just clears everything up after
>>> backfilling.
>>>
>>>
>>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
>>>
>>> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>>>
>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>
>>>
>>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
>>
>> That's not perfectly correct.
>>
>> OSD.0/1/2 seem to be down.
>>
>>> like related to ?:
>>>
>>> Ceph1, ceph2 and ceph3 are vms on one physical host
>>>
>>>
>>> Are those OSDs running on vm instances?
>>>
>>> # 9.7
>>> <snip>
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>>  "up": [
>>>      7,
>>>      3
>>>  ],
>>>  "acting": [
>>>      7,
>>>      3,
>>>      0
>>>  ],
>>>
>>> <snip>
>>>
>>> # 7.84
>>> <snip>
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>> "up": [
>>>      4,
>>>      8
>>>  ],
>>>  "acting": [
>>>      4,
>>>      8,
>>>      1
>>>  ],
>>>
>>> <snip>
>>>
>>> # 8.1b
>>> <snip>
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>>  "up": [
>>>      4,
>>>      7
>>>  ],
>>>  "acting": [
>>>      4,
>>>      7,
>>>      2
>>>  ],
>>>
>>> <snip>
>>>
>>> # 7.7a
>>> <snip>
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>>  "up": [
>>>      7,
>>>      4
>>>  ],
>>>  "acting": [
>>>      7,
>>>      4,
>>>      2
>>>  ],
>>>
>>> <snip>
>>>
>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com