Re: pgs stuck inactive

Brad Hubbard <bhubbard@xxxxxxxxxx> · Sat, 11 Mar 2017 08:32:59 +1000

So this is why it happened I guess.

pool 3 'volumes' replicated size 3 min_size 1

min_size = 1 is a recipe for disasters like this and there are plenty
of ML threads about not setting it below 2.

The past intervals in the pg query show several intervals where a
single OSD may have gone rw.

How important is this data?

I would suggest checking which of these OSDs actually have the data
for this pg. From the pg query it looks like 2, 35 and 68 and possibly
28 since it's the primary. Check all OSDs in the pg query output. I
would then back up all copies and work out which copy, if any, you
want to keep and then attempt something like the following.

https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17820.html

If you want to abandon the pg see
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html
for a possible solution.

http://ceph.com/community/incomplete-pgs-oh-my/ may also give some ideas.

On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:
> The OSDs are all there.
>
> $ sudo ceph osd stat
>      osdmap e60609: 72 osds: 72 up, 72 in
>
> an I have attached the result of ceph osd tree, and ceph osd dump commands.
> I got some extra info about the network problem. A faulty network device has
> flooded the network eating up all the bandwidth so the OSDs were not able to
> properly communicate with each other. This has lasted for almost 1 day.
>
> Thank you,
> Laszlo
>
>
>
> On 10.03.2017 12:19, Brad Hubbard wrote:
>>
>> To me it looks like someone may have done an "rm" on these OSDs but
>> not removed them from the crushmap. This does not happen
>> automatically.
>>
>> Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so,
>> paste the output.
>>
>> Without knowing what exactly happened here it may be difficult to work
>> out how to proceed.
>>
>> In order to go clean the primary needs to communicate with multiple
>> OSDs, some of which are marked DNE and seem to be uncontactable.
>>
>> This seems to be more than a network issue (unless the outage is still
>> happening).
>>
>>
>> http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete
>>
>>
>>
>> On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx>
>> wrote:
>>>
>>> Hello,
>>>
>>> I was informed that due to a networking issue the ceph cluster network
>>> was
>>> affected. There was a huge packet loss, and network interfaces were
>>> flipping. That's all I got.
>>> This outage has lasted a longer period of time. So I assume that some OSD
>>> may have been considered dead and the data from them has been moved away
>>> to
>>> other PGs (this is what ceph is supposed to do if I'm correct). Probably
>>> that was the point when the listed PGs have appeared into the picture.
>>> From the query we can see this for one of those OSDs:
>>>         {
>>>             "peer": "14",
>>>             "pgid": "3.367",
>>>             "last_update": "0'0",
>>>             "last_complete": "0'0",
>>>             "log_tail": "0'0",
>>>             "last_user_version": 0,
>>>             "last_backfill": "MAX",
>>>             "purged_snaps": "[]",
>>>             "history": {
>>>                 "epoch_created": 4,
>>>                 "last_epoch_started": 54899,
>>>                 "last_epoch_clean": 55143,
>>>                 "last_epoch_split": 0,
>>>                 "same_up_since": 60603,
>>>                 "same_interval_since": 60603,
>>>                 "same_primary_since": 60593,
>>>                 "last_scrub": "2852'33528",
>>>                 "last_scrub_stamp": "2017-02-26 02:36:55.210150",
>>>                 "last_deep_scrub": "2852'16480",
>>>                 "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448",
>>>                 "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150"
>>>             },
>>>             "stats": {
>>>                 "version": "0'0",
>>>                 "reported_seq": "14",
>>>                 "reported_epoch": "59779",
>>>                 "state": "down+peering",
>>>                 "last_fresh": "2017-02-27 16:30:16.230519",
>>>                 "last_change": "2017-02-27 16:30:15.267995",
>>>                 "last_active": "0.000000",
>>>                 "last_peered": "0.000000",
>>>                 "last_clean": "0.000000",
>>>                 "last_became_active": "0.000000",
>>>                 "last_became_peered": "0.000000",
>>>                 "last_unstale": "2017-02-27 16:30:16.230519",
>>>                 "last_undegraded": "2017-02-27 16:30:16.230519",
>>>                 "last_fullsized": "2017-02-27 16:30:16.230519",
>>>                 "mapping_epoch": 60601,
>>>                 "log_start": "0'0",
>>>                 "ondisk_log_start": "0'0",
>>>                 "created": 4,
>>>                 "last_epoch_clean": 55143,
>>>                 "parent": "0.0",
>>>                 "parent_split_bits": 0,
>>>                 "last_scrub": "2852'33528",
>>>                 "last_scrub_stamp": "2017-02-26 02:36:55.210150",
>>>                 "last_deep_scrub": "2852'16480",
>>>                 "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448",
>>>                 "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150",
>>>                 "log_size": 0,
>>>                 "ondisk_log_size": 0,
>>>                 "stats_invalid": "0",
>>>                 "stat_sum": {
>>>                     "num_bytes": 0,
>>>                     "num_objects": 0,
>>>                     "num_object_clones": 0,
>>>                     "num_object_copies": 0,
>>>                     "num_objects_missing_on_primary": 0,
>>>                     "num_objects_degraded": 0,
>>>                     "num_objects_misplaced": 0,
>>>                     "num_objects_unfound": 0,
>>>                     "num_objects_dirty": 0,
>>>                     "num_whiteouts": 0,
>>>                     "num_read": 0,
>>>                     "num_read_kb": 0,
>>>                     "num_write": 0,
>>>                     "num_write_kb": 0,
>>>                     "num_scrub_errors": 0,
>>>                     "num_shallow_scrub_errors": 0,
>>>                     "num_deep_scrub_errors": 0,
>>>                     "num_objects_recovered": 0,
>>>                     "num_bytes_recovered": 0,
>>>                     "num_keys_recovered": 0,
>>>                     "num_objects_omap": 0,
>>>                     "num_objects_hit_set_archive": 0,
>>>                     "num_bytes_hit_set_archive": 0
>>>                 },
>>>                 "up": [
>>>                     28,
>>>                     35,
>>>                     2
>>>                 ],
>>>                 "acting": [
>>>                     28,
>>>                     35,
>>>                     2
>>>                 ],
>>>                 "blocked_by": [],
>>>                 "up_primary": 28,
>>>                 "acting_primary": 28
>>>             },
>>>             "empty": 1,
>>>             "dne": 0,
>>>             "incomplete": 0,
>>>             "last_epoch_started": 0,
>>>             "hit_set_history": {
>>>                 "current_last_update": "0'0",
>>>                 "current_last_stamp": "0.000000",
>>>                 "current_info": {
>>>                     "begin": "0.000000",
>>>                     "end": "0.000000",
>>>                     "version": "0'0",
>>>                     "using_gmt": "1"
>>>                 },
>>>                 "history": []
>>>             }
>>>         },
>>>
>>> Where can I read more about the meaning of each parameter, some of them
>>> have
>>> quite self explanatory names, but not all (or probably we need a deeper
>>> knowledge to understand them).
>>> Isn't there any parameter that would say when was that OSD assigned to
>>> the
>>> given PG? Also the stat_sum shows 0 for all its parameters. Why is it
>>> blocking then?
>>>
>>> Is there a way to tell the PG to forget about that OSD?
>>>
>>> Thank you,
>>> Laszlo
>>>
>>>
>>> On 10.03.2017 03:05, Brad Hubbard wrote:
>>>>
>>>>
>>>> Can you explain more about what happened?
>>>>
>>>> The query shows progress is blocked by the following OSDs.
>>>>
>>>>                 "blocked_by": [
>>>>                     14,
>>>>                     17,
>>>>                     51,
>>>>                     58,
>>>>                     63,
>>>>                     64,
>>>>                     68,
>>>>                     70
>>>>                 ],
>>>>
>>>> Some of these OSDs are marked as "dne" (Does Not Exist).
>>>>
>>>> peer": "17",
>>>> "dne": 1,
>>>> "peer": "51",
>>>> "dne": 1,
>>>> "peer": "58",
>>>> "dne": 1,
>>>> "peer": "64",
>>>> "dne": 1,
>>>> "peer": "70",
>>>> "dne": 1,
>>>>
>>>> Can we get a complete background here please?
>>>>
>>>>
>>>> On Thu, Mar 9, 2017 at 10:53 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> After a major network outage our ceph cluster ended up with an inactive
>>>>> PG:
>>>>>
>>>>> # ceph health detail
>>>>> HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck
>>>>> unclean;
>>>>> 1
>>>>> requests are blocked > 32 sec; 1 osds have slow requests
>>>>> pg 3.367 is stuck inactive for 912263.766607, current state incomplete,
>>>>> last
>>>>> acting [28,35,2]
>>>>> pg 3.367 is stuck unclean for 912263.766688, current state incomplete,
>>>>> last
>>>>> acting [28,35,2]
>>>>> pg 3.367 is incomplete, acting [28,35,2]
>>>>> 1 ops are blocked > 268435 sec
>>>>> 1 ops are blocked > 268435 sec on osd.28
>>>>> 1 osds have slow requests
>>>>>
>>>>> # ceph -s
>>>>>     cluster 6713d1b8-83da-11e6-aa79-525400d98c5a
>>>>>      health HEALTH_WARN
>>>>>             1 pgs incomplete
>>>>>             1 pgs stuck inactive
>>>>>             1 pgs stuck unclean
>>>>>             1 requests are blocked > 32 sec
>>>>>      monmap e3: 3 mons at
>>>>>
>>>>>
>>>>> {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0}
>>>>>             election epoch 72, quorum 0,1,2
>>>>> tv-dl360-1,tv-dl360-2,tv-dl360-3
>>>>>      osdmap e60609: 72 osds: 72 up, 72 in
>>>>>       pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778 objects
>>>>>             490 GB used, 130 TB / 130 TB avail
>>>>>                 4863 active+clean
>>>>>                    1 incomplete
>>>>>   client io 0 B/s rd, 38465 B/s wr, 2 op/s
>>>>>
>>>>> ceph pg repair doesn't change anything. What should I try to recover
>>>>> it?
>>>>> Attached is the result of ceph pg query on the problem PG.
>>>>>
>>>>> Thank you,
>>>>> Laszlo
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com