Re: pgs stuck inactive

Brad Hubbard <bhubbard@xxxxxxxxxx> · Sun, 12 Mar 2017 10:12:41 +1000

On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> Thank you for your answer.
>
> indeed the min_size is 1:
>
> # ceph osd pool get volumes size
> size: 3
> # ceph osd pool get volumes min_size
> min_size: 1
> #
> I'm gonna try to find the mentioned discussions on the mailing lists, and
> read them. If you have a link at hand, that would be nice if you would send
> it to me.

This thread is one example, there are lots more.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html

>
> In the attached file you can see the contents of the directory containing PG
> data on the different OSDs (all that have appeared in the pg query).
> According to the md5sums the files are identical. What bothers me is the
> directory structure (you can see the ls -R in each dir that contains files).

So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63
since 68 is listed as empty in the pg query.

>
> Where can I read about how/why those DIR# subdirectories have appeared?
>
> Given that the files themselves are identical on the "current" OSDs
> belonging to the PG, and as the osd.63 (currently not belonging to the PG)
> has the same files, is it safe to stop the OSD.2, remove the 3.367_head dir,
> and then restart the OSD? (all these with the noout flag set of course)

*You* need to decide which is the "good" copy and then follow the
instructions in the links I provided to try and recover the pg. Back
those known copies on 2, 28, 35 and 63 up with the
ceph_objectstore_tool before proceeding. They may well be identical
but the peering process still needs to "see" the relevant logs and
currently something is stopping it doing so.

>
> Kind regards,
> Laszlo
>
>
> On 11.03.2017 00:32, Brad Hubbard wrote:
>>
>> So this is why it happened I guess.
>>
>> pool 3 'volumes' replicated size 3 min_size 1
>>
>> min_size = 1 is a recipe for disasters like this and there are plenty
>> of ML threads about not setting it below 2.
>>
>> The past intervals in the pg query show several intervals where a
>> single OSD may have gone rw.
>>
>> How important is this data?
>>
>> I would suggest checking which of these OSDs actually have the data
>> for this pg. From the pg query it looks like 2, 35 and 68 and possibly
>> 28 since it's the primary. Check all OSDs in the pg query output. I
>> would then back up all copies and work out which copy, if any, you
>> want to keep and then attempt something like the following.
>>
>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17820.html
>>
>> If you want to abandon the pg see
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html
>> for a possible solution.
>>
>> http://ceph.com/community/incomplete-pgs-oh-my/ may also give some ideas.
>>
>>
>> On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx>
>> wrote:
>>>
>>> The OSDs are all there.
>>>
>>> $ sudo ceph osd stat
>>>      osdmap e60609: 72 osds: 72 up, 72 in
>>>
>>> an I have attached the result of ceph osd tree, and ceph osd dump
>>> commands.
>>> I got some extra info about the network problem. A faulty network device
>>> has
>>> flooded the network eating up all the bandwidth so the OSDs were not able
>>> to
>>> properly communicate with each other. This has lasted for almost 1 day.
>>>
>>> Thank you,
>>> Laszlo
>>>
>>>
>>>
>>> On 10.03.2017 12:19, Brad Hubbard wrote:
>>>>
>>>>
>>>> To me it looks like someone may have done an "rm" on these OSDs but
>>>> not removed them from the crushmap. This does not happen
>>>> automatically.
>>>>
>>>> Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so,
>>>> paste the output.
>>>>
>>>> Without knowing what exactly happened here it may be difficult to work
>>>> out how to proceed.
>>>>
>>>> In order to go clean the primary needs to communicate with multiple
>>>> OSDs, some of which are marked DNE and seem to be uncontactable.
>>>>
>>>> This seems to be more than a network issue (unless the outage is still
>>>> happening).
>>>>
>>>>
>>>>
>>>> http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete
>>>>
>>>>
>>>>
>>>> On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I was informed that due to a networking issue the ceph cluster network
>>>>> was
>>>>> affected. There was a huge packet loss, and network interfaces were
>>>>> flipping. That's all I got.
>>>>> This outage has lasted a longer period of time. So I assume that some
>>>>> OSD
>>>>> may have been considered dead and the data from them has been moved
>>>>> away
>>>>> to
>>>>> other PGs (this is what ceph is supposed to do if I'm correct).
>>>>> Probably
>>>>> that was the point when the listed PGs have appeared into the picture.
>>>>> From the query we can see this for one of those OSDs:
>>>>>         {
>>>>>             "peer": "14",
>>>>>             "pgid": "3.367",
>>>>>             "last_update": "0'0",
>>>>>             "last_complete": "0'0",
>>>>>             "log_tail": "0'0",
>>>>>             "last_user_version": 0,
>>>>>             "last_backfill": "MAX",
>>>>>             "purged_snaps": "[]",
>>>>>             "history": {
>>>>>                 "epoch_created": 4,
>>>>>                 "last_epoch_started": 54899,
>>>>>                 "last_epoch_clean": 55143,
>>>>>                 "last_epoch_split": 0,
>>>>>                 "same_up_since": 60603,
>>>>>                 "same_interval_since": 60603,
>>>>>                 "same_primary_since": 60593,
>>>>>                 "last_scrub": "2852'33528",
>>>>>                 "last_scrub_stamp": "2017-02-26 02:36:55.210150",
>>>>>                 "last_deep_scrub": "2852'16480",
>>>>>                 "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448",
>>>>>                 "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150"
>>>>>             },
>>>>>             "stats": {
>>>>>                 "version": "0'0",
>>>>>                 "reported_seq": "14",
>>>>>                 "reported_epoch": "59779",
>>>>>                 "state": "down+peering",
>>>>>                 "last_fresh": "2017-02-27 16:30:16.230519",
>>>>>                 "last_change": "2017-02-27 16:30:15.267995",
>>>>>                 "last_active": "0.000000",
>>>>>                 "last_peered": "0.000000",
>>>>>                 "last_clean": "0.000000",
>>>>>                 "last_became_active": "0.000000",
>>>>>                 "last_became_peered": "0.000000",
>>>>>                 "last_unstale": "2017-02-27 16:30:16.230519",
>>>>>                 "last_undegraded": "2017-02-27 16:30:16.230519",
>>>>>                 "last_fullsized": "2017-02-27 16:30:16.230519",
>>>>>                 "mapping_epoch": 60601,
>>>>>                 "log_start": "0'0",
>>>>>                 "ondisk_log_start": "0'0",
>>>>>                 "created": 4,
>>>>>                 "last_epoch_clean": 55143,
>>>>>                 "parent": "0.0",
>>>>>                 "parent_split_bits": 0,
>>>>>                 "last_scrub": "2852'33528",
>>>>>                 "last_scrub_stamp": "2017-02-26 02:36:55.210150",
>>>>>                 "last_deep_scrub": "2852'16480",
>>>>>                 "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448",
>>>>>                 "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150",
>>>>>                 "log_size": 0,
>>>>>                 "ondisk_log_size": 0,
>>>>>                 "stats_invalid": "0",
>>>>>                 "stat_sum": {
>>>>>                     "num_bytes": 0,
>>>>>                     "num_objects": 0,
>>>>>                     "num_object_clones": 0,
>>>>>                     "num_object_copies": 0,
>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>                     "num_objects_degraded": 0,
>>>>>                     "num_objects_misplaced": 0,
>>>>>                     "num_objects_unfound": 0,
>>>>>                     "num_objects_dirty": 0,
>>>>>                     "num_whiteouts": 0,
>>>>>                     "num_read": 0,
>>>>>                     "num_read_kb": 0,
>>>>>                     "num_write": 0,
>>>>>                     "num_write_kb": 0,
>>>>>                     "num_scrub_errors": 0,
>>>>>                     "num_shallow_scrub_errors": 0,
>>>>>                     "num_deep_scrub_errors": 0,
>>>>>                     "num_objects_recovered": 0,
>>>>>                     "num_bytes_recovered": 0,
>>>>>                     "num_keys_recovered": 0,
>>>>>                     "num_objects_omap": 0,
>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>                     "num_bytes_hit_set_archive": 0
>>>>>                 },
>>>>>                 "up": [
>>>>>                     28,
>>>>>                     35,
>>>>>                     2
>>>>>                 ],
>>>>>                 "acting": [
>>>>>                     28,
>>>>>                     35,
>>>>>                     2
>>>>>                 ],
>>>>>                 "blocked_by": [],
>>>>>                 "up_primary": 28,
>>>>>                 "acting_primary": 28
>>>>>             },
>>>>>             "empty": 1,
>>>>>             "dne": 0,
>>>>>             "incomplete": 0,
>>>>>             "last_epoch_started": 0,
>>>>>             "hit_set_history": {
>>>>>                 "current_last_update": "0'0",
>>>>>                 "current_last_stamp": "0.000000",
>>>>>                 "current_info": {
>>>>>                     "begin": "0.000000",
>>>>>                     "end": "0.000000",
>>>>>                     "version": "0'0",
>>>>>                     "using_gmt": "1"
>>>>>                 },
>>>>>                 "history": []
>>>>>             }
>>>>>         },
>>>>>
>>>>> Where can I read more about the meaning of each parameter, some of them
>>>>> have
>>>>> quite self explanatory names, but not all (or probably we need a deeper
>>>>> knowledge to understand them).
>>>>> Isn't there any parameter that would say when was that OSD assigned to
>>>>> the
>>>>> given PG? Also the stat_sum shows 0 for all its parameters. Why is it
>>>>> blocking then?
>>>>>
>>>>> Is there a way to tell the PG to forget about that OSD?
>>>>>
>>>>> Thank you,
>>>>> Laszlo
>>>>>
>>>>>
>>>>> On 10.03.2017 03:05, Brad Hubbard wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you explain more about what happened?
>>>>>>
>>>>>> The query shows progress is blocked by the following OSDs.
>>>>>>
>>>>>>                 "blocked_by": [
>>>>>>                     14,
>>>>>>                     17,
>>>>>>                     51,
>>>>>>                     58,
>>>>>>                     63,
>>>>>>                     64,
>>>>>>                     68,
>>>>>>                     70
>>>>>>                 ],
>>>>>>
>>>>>> Some of these OSDs are marked as "dne" (Does Not Exist).
>>>>>>
>>>>>> peer": "17",
>>>>>> "dne": 1,
>>>>>> "peer": "51",
>>>>>> "dne": 1,
>>>>>> "peer": "58",
>>>>>> "dne": 1,
>>>>>> "peer": "64",
>>>>>> "dne": 1,
>>>>>> "peer": "70",
>>>>>> "dne": 1,
>>>>>>
>>>>>> Can we get a complete background here please?
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 9, 2017 at 10:53 PM, Laszlo Budai
>>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> After a major network outage our ceph cluster ended up with an
>>>>>>> inactive
>>>>>>> PG:
>>>>>>>
>>>>>>> # ceph health detail
>>>>>>> HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck
>>>>>>> unclean;
>>>>>>> 1
>>>>>>> requests are blocked > 32 sec; 1 osds have slow requests
>>>>>>> pg 3.367 is stuck inactive for 912263.766607, current state
>>>>>>> incomplete,
>>>>>>> last
>>>>>>> acting [28,35,2]
>>>>>>> pg 3.367 is stuck unclean for 912263.766688, current state
>>>>>>> incomplete,
>>>>>>> last
>>>>>>> acting [28,35,2]
>>>>>>> pg 3.367 is incomplete, acting [28,35,2]
>>>>>>> 1 ops are blocked > 268435 sec
>>>>>>> 1 ops are blocked > 268435 sec on osd.28
>>>>>>> 1 osds have slow requests
>>>>>>>
>>>>>>> # ceph -s
>>>>>>>     cluster 6713d1b8-83da-11e6-aa79-525400d98c5a
>>>>>>>      health HEALTH_WARN
>>>>>>>             1 pgs incomplete
>>>>>>>             1 pgs stuck inactive
>>>>>>>             1 pgs stuck unclean
>>>>>>>             1 requests are blocked > 32 sec
>>>>>>>      monmap e3: 3 mons at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0}
>>>>>>>             election epoch 72, quorum 0,1,2
>>>>>>> tv-dl360-1,tv-dl360-2,tv-dl360-3
>>>>>>>      osdmap e60609: 72 osds: 72 up, 72 in
>>>>>>>       pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778 objects
>>>>>>>             490 GB used, 130 TB / 130 TB avail
>>>>>>>                 4863 active+clean
>>>>>>>                    1 incomplete
>>>>>>>   client io 0 B/s rd, 38465 B/s wr, 2 op/s
>>>>>>>
>>>>>>> ceph pg repair doesn't change anything. What should I try to recover
>>>>>>> it?
>>>>>>> Attached is the result of ceph pg query on the problem PG.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Laszlo
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com