Re: pgs stuck inactive

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 17 Mar 2017 13:38:11 +1000

So I've tested this procedure locally and it works successfully for me.

$ ./ceph -v
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)

$ ./ceph-objectstore-tool import-rados rbd 0.3.export
Importing from pgid 0.3
Write 0/409a1413/benchmark_data_boxen1.example.com_27596_object59/head
Write 0/377af323/benchmark_data_boxen1.example.com_27596_object35/head
Write 0/6448d043/benchmark_data_boxen1.example.com_27596_object73/head
Write 0/410ee8a3/benchmark_data_boxen1.example.com_27596_object52/head
Write 0/c50d4ba3/benchmark_data_boxen1.example.com_27596_object43/head
Write 0/978edd4b/benchmark_data_boxen1.example.com_27596_object56/head
Write 0/97d8967b/benchmark_data_boxen1.example.com_27596_object33/head
Write 0/9c52a59b/benchmark_data_boxen1.example.com_27596_object34/head
Write 0/d762a5db/benchmark_data_boxen1.example.com_27596_object44/head
Import successful

This is, of course, after deleting all copies of the pg on the
relevant OSDs and running force_create_pg to recreate the pg.

>From what I can see in the stack trace the rados client connection
does not seem to be completing correctly. I'm hoping we can get more
information on the problem by adding the following to the client
section of your ceph.conf file on the machine you are running
ceph-objectstore-tool on.

[client]
        debug objecter = 20
        debug rados = 20
        debug ms = 5
        log file = /var/log/ceph/$cluster-$name.$pid.log

Then run the ceph-objectstore-tool again taking careful note of what
file is created in /var/log/ceph/ and upload that.

On Thu, Mar 16, 2017 at 5:21 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:
> My mistake, I've run it on a wrong system ...
>
> I've attached the terminal output.
>
> I've run this on a test system where I was getting the same segfault when
> trying import-rados.
>
> Kind regards,
> Laszlo
>
> On 16.03.2017 07:41, Laszlo Budai wrote:
>>
>>
>> [root@storage2 ~]# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args
>> ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/bin/ceph-objectstore-tool...Reading symbols from
>> /usr/lib/debug/usr/bin/ceph-objectstore-tool.debug...done.
>> done.
>> Starting program: /usr/bin/ceph-objectstore-tool import-rados volumes
>> pg.3.367.export.OSD.35
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> open: No such file or directory
>> [Inferior 1 (process 23735) exited with code 01]
>> [root@storage2 ~]#
>>
>>
>>
>>
>> Just checked:
>> [root@storage2 lib64]# ls -l /lib64/libthread_db*
>> -rwxr-xr-x. 1 root root 38352 May 12  2016 /lib64/libthread_db-1.0.so
>> lrwxrwxrwx. 1 root root    19 Jun  7  2016 /lib64/libthread_db.so.1 ->
>> libthread_db-1.0.so
>> [root@storage2 lib64]#
>>
>>
>> Kind regards,
>> Laszlo
>>
>>
>> On 16.03.2017 05:26, Brad Hubbard wrote:
>>>
>>> Can you install the debuginfo for ceph (how this works depends on your
>>> distro) and run the following?
>>>
>>> # gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool
>>> import-rados volumes pg.3.367.export.OSD.35
>>>
>>> On Thu, Mar 16, 2017 at 12:02 AM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> the ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
>>>> command crashes.
>>>>
>>>> ~# ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
>>>> *** Caught signal (Segmentation fault) **
>>>>  in thread 7f85b60e28c0
>>>>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>>>>  1: ceph-objectstore-tool() [0xaeeaba]
>>>>  2: (()+0x10330) [0x7f85b4dca330]
>>>>  3: (()+0xa2324) [0x7f85b1cd7324]
>>>>  4: (()+0x7d23e) [0x7f85b1cb223e]
>>>>  5: (()+0x7d478) [0x7f85b1cb2478]
>>>>  6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
>>>>  7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
>>>> [0x7f85b1c8a0e5]
>>>>  8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
>>>>  9: (main()+0x1294) [0x651134]
>>>>  10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
>>>>  11: ceph-objectstore-tool() [0x66f8b7]
>>>> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal
>>>> (Segmentation
>>>> fault) **
>>>>  in thread 7f85b60e28c0
>>>>
>>>>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>>>>  1: ceph-objectstore-tool() [0xaeeaba]
>>>>  2: (()+0x10330) [0x7f85b4dca330]
>>>>  3: (()+0xa2324) [0x7f85b1cd7324]
>>>>  4: (()+0x7d23e) [0x7f85b1cb223e]
>>>>  5: (()+0x7d478) [0x7f85b1cb2478]
>>>>  6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
>>>>  7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
>>>> [0x7f85b1c8a0e5]
>>>>  8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
>>>>  9: (main()+0x1294) [0x651134]
>>>>  10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
>>>>  11: ceph-objectstore-tool() [0x66f8b7]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to
>>>> interpret this.
>>>>
>>>> --- begin dump of recent events ---
>>>>    -14> 2017-03-15 14:57:05.557743 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command perfcounters_dump hook 0x55e6130
>>>>    -13> 2017-03-15 14:57:05.557807 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command 1 hook 0x55e6130
>>>>    -12> 2017-03-15 14:57:05.557818 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command perf dump hook 0x55e6130
>>>>    -11> 2017-03-15 14:57:05.557828 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command perfcounters_schema hook 0x55e6130
>>>>    -10> 2017-03-15 14:57:05.557836 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command 2 hook 0x55e6130
>>>>     -9> 2017-03-15 14:57:05.557841 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command perf schema hook 0x55e6130
>>>>     -8> 2017-03-15 14:57:05.557851 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command perf reset hook 0x55e6130
>>>>     -7> 2017-03-15 14:57:05.557855 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command config show hook 0x55e6130
>>>>     -6> 2017-03-15 14:57:05.557864 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command config set hook 0x55e6130
>>>>     -5> 2017-03-15 14:57:05.557868 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command config get hook 0x55e6130
>>>>     -4> 2017-03-15 14:57:05.557877 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command config diff hook 0x55e6130
>>>>     -3> 2017-03-15 14:57:05.557880 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command log flush hook 0x55e6130
>>>>     -2> 2017-03-15 14:57:05.557888 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command log dump hook 0x55e6130
>>>>     -1> 2017-03-15 14:57:05.557892 7f85b60e28c0  5 asok(0x5632000)
>>>> register_command log reopen hook 0x55e6130
>>>>      0> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal
>>>> (Segmentation fault) **
>>>>  in thread 7f85b60e28c0
>>>>
>>>>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>>>>  1: ceph-objectstore-tool() [0xaeeaba]
>>>>  2: (()+0x10330) [0x7f85b4dca330]
>>>>  3: (()+0xa2324) [0x7f85b1cd7324]
>>>>  4: (()+0x7d23e) [0x7f85b1cb223e]
>>>>  5: (()+0x7d478) [0x7f85b1cb2478]
>>>>  6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
>>>>  7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
>>>> [0x7f85b1c8a0e5]
>>>>  8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
>>>>  9: (main()+0x1294) [0x651134]
>>>>  10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
>>>>  11: ceph-objectstore-tool() [0x66f8b7]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to
>>>> interpret this.
>>>>
>>>> --- logging levels ---
>>>>    0/ 5 none
>>>>    0/ 1 lockdep
>>>>    0/ 1 context
>>>>    1/ 1 crush
>>>>    1/ 5 mds
>>>>    1/ 5 mds_balancer
>>>>    1/ 5 mds_locker
>>>>    1/ 5 mds_log
>>>>    1/ 5 mds_log_expire
>>>>    1/ 5 mds_migrator
>>>>    0/ 1 buffer
>>>>    0/ 1 timer
>>>>    0/ 1 filer
>>>>    0/ 1 striper
>>>>    0/ 1 objecter
>>>>    0/ 5 rados
>>>>    0/ 5 rbd
>>>>    0/ 5 rbd_replay
>>>>    0/ 5 journaler
>>>>    0/ 5 objectcacher
>>>>    0/ 5 client
>>>>    0/ 5 osd
>>>>    0/ 5 optracker
>>>>    0/ 5 objclass
>>>>    1/ 3 filestore
>>>>    1/ 3 keyvaluestore
>>>>    1/ 3 journal
>>>>    0/ 5 ms
>>>>    1/ 5 mon
>>>>    0/10 monc
>>>>    1/ 5 paxos
>>>>    0/ 5 tp
>>>>    1/ 5 auth
>>>>    1/ 5 crypto
>>>>    1/ 1 finisher
>>>>    1/ 5 heartbeatmap
>>>>    1/ 5 perfcounter
>>>>    1/ 5 rgw
>>>>    1/10 civetweb
>>>>    1/ 5 javaclient
>>>>    1/ 5 asok
>>>>    1/ 1 throttle
>>>>    0/ 0 refs
>>>>    1/ 5 xio
>>>>   -2/-2 (syslog threshold)
>>>>   99/99 (stderr threshold)
>>>>   max_recent       500
>>>>   max_new         1000
>>>>   log_file
>>>> --- end dump of recent events ---
>>>> Segmentation fault (core dumped)
>>>> #
>>>>
>>>> Any ideas what to try?
>>>>
>>>> Thank you.
>>>> Laszlo
>>>>
>>>>
>>>> On 15.03.2017 04:27, Brad Hubbard wrote:
>>>>>
>>>>>
>>>>> Decide which copy you want to keep and export that with
>>>>> ceph-objectstore-tool
>>>>>
>>>>> Delete all copies on all OSDs with ceph-objectstore-tool (not by
>>>>> deleting the directory on the disk).
>>>>>
>>>>> Use force_create_pg to recreate the pg empty.
>>>>>
>>>>> Use ceph-objectstore-tool to do a rados import on the exported pg copy.
>>>>>
>>>>>
>>>>> On Wed, Mar 15, 2017 at 12:00 PM, Laszlo Budai
>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have tried to recover the pg using the following steps:
>>>>>> Preparation:
>>>>>> 1. set noout
>>>>>> 2. stop osd.2
>>>>>> 3. use ceph-objectstore-tool to export from osd2
>>>>>> 4. start osd.2
>>>>>> 5. repeat step 2-4 on osd 35,28, 63 (I've done these hoping to be able
>>>>>> to
>>>>>> use one of those exports to recover the PG)
>>>>>>
>>>>>>
>>>>>> First attempt:
>>>>>>
>>>>>> 1. stop osd.2
>>>>>> 2. remove the 3.367_head directory
>>>>>> 3. start osd.2
>>>>>> Here I was hoping that the cluster will recover the pg from the 2
>>>>>> other
>>>>>> identical osds. It did NOT. So I have tried the following commands on
>>>>>> the
>>>>>> PG:
>>>>>> ceph pg repair
>>>>>> ceph pg scrub
>>>>>> ceph pg deep-scrub
>>>>>> ceph pg force_create_pg
>>>>>>  nothing changed. My PG was still incomplete. So I tried to remove all
>>>>>> the
>>>>>> OSDs that were referenced in the pg query:
>>>>>>
>>>>>>
>>>>>> 1. stop osd.2
>>>>>> 2. delete the 3.367_head directory
>>>>>> 3. start osd2
>>>>>> 4 repeat steps 6-8 for all the OSDs that were listed in the pg query
>>>>>> 5. did an import from one of the exports. -> I was able again to query
>>>>>> the
>>>>>> pg (that was impossible when all the 3.367_head dirs were deleted) and
>>>>>> the
>>>>>> stats were saying that the number of objects is 6 the size is 21M (all
>>>>>> correct values according to the files I was able to see before
>>>>>> starting
>>>>>> the
>>>>>> procedure) But the PG is still incomplete.
>>>>>>
>>>>>> What else can I try?
>>>>>>
>>>>>> Thank you,
>>>>>> Laszlo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12.03.2017 13:06, Brad Hubbard wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai
>>>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I have already done the export with ceph_objectstore_tool. I just
>>>>>>>> have
>>>>>>>> to
>>>>>>>> decide which OSDs to keep.
>>>>>>>> Can you tell me why the directory structure in the OSDs is different
>>>>>>>> for
>>>>>>>> the
>>>>>>>> same PG when checking on different OSDs?
>>>>>>>> For instance, in OSD 2 and 63 there are NO subdirectories in the
>>>>>>>> 3.367__head, while OSD 28, 35 contains
>>>>>>>> ./DIR_7/DIR_6/DIR_B/
>>>>>>>> ./DIR_7/DIR_6/DIR_3/
>>>>>>>>
>>>>>>>> When are these subdirectories created?
>>>>>>>>
>>>>>>>> The files are identical on all the OSDs, only the way how these are
>>>>>>>> stored
>>>>>>>> is different. It would be enough if you could point me to some
>>>>>>>> documentation
>>>>>>>> that explain these, I'll read it. So far, searching for the
>>>>>>>> architecture
>>>>>>>> of
>>>>>>>> an OSD, I could not find the gory details about these directories.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/ceph/ceph/blob/master/src/os/filestore/HashIndex.h
>>>>>>>
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Laszlo
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12.03.2017 02:12, Brad Hubbard wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai
>>>>>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Thank you for your answer.
>>>>>>>>>>
>>>>>>>>>> indeed the min_size is 1:
>>>>>>>>>>
>>>>>>>>>> # ceph osd pool get volumes size
>>>>>>>>>> size: 3
>>>>>>>>>> # ceph osd pool get volumes min_size
>>>>>>>>>> min_size: 1
>>>>>>>>>> #
>>>>>>>>>> I'm gonna try to find the mentioned discussions on the mailing
>>>>>>>>>> lists,
>>>>>>>>>> and
>>>>>>>>>> read them. If you have a link at hand, that would be nice if you
>>>>>>>>>> would
>>>>>>>>>> send
>>>>>>>>>> it to me.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This thread is one example, there are lots more.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In the attached file you can see the contents of the directory
>>>>>>>>>> containing
>>>>>>>>>> PG
>>>>>>>>>> data on the different OSDs (all that have appeared in the pg
>>>>>>>>>> query).
>>>>>>>>>> According to the md5sums the files are identical. What bothers me
>>>>>>>>>> is
>>>>>>>>>> the
>>>>>>>>>> directory structure (you can see the ls -R in each dir that
>>>>>>>>>> contains
>>>>>>>>>> files).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63
>>>>>>>>> since 68 is listed as empty in the pg query.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Where can I read about how/why those DIR# subdirectories have
>>>>>>>>>> appeared?
>>>>>>>>>>
>>>>>>>>>> Given that the files themselves are identical on the "current"
>>>>>>>>>> OSDs
>>>>>>>>>> belonging to the PG, and as the osd.63 (currently not belonging to
>>>>>>>>>> the
>>>>>>>>>> PG)
>>>>>>>>>> has the same files, is it safe to stop the OSD.2, remove the
>>>>>>>>>> 3.367_head
>>>>>>>>>> dir,
>>>>>>>>>> and then restart the OSD? (all these with the noout flag set of
>>>>>>>>>> course)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *You* need to decide which is the "good" copy and then follow the
>>>>>>>>> instructions in the links I provided to try and recover the pg.
>>>>>>>>> Back
>>>>>>>>> those known copies on 2, 28, 35 and 63 up with the
>>>>>>>>> ceph_objectstore_tool before proceeding. They may well be identical
>>>>>>>>> but the peering process still needs to "see" the relevant logs and
>>>>>>>>> currently something is stopping it doing so.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>> Laszlo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11.03.2017 00:32, Brad Hubbard wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So this is why it happened I guess.
>>>>>>>>>>>
>>>>>>>>>>> pool 3 'volumes' replicated size 3 min_size 1
>>>>>>>>>>>
>>>>>>>>>>> min_size = 1 is a recipe for disasters like this and there are
>>>>>>>>>>> plenty
>>>>>>>>>>> of ML threads about not setting it below 2.
>>>>>>>>>>>
>>>>>>>>>>> The past intervals in the pg query show several intervals where a
>>>>>>>>>>> single OSD may have gone rw.
>>>>>>>>>>>
>>>>>>>>>>> How important is this data?
>>>>>>>>>>>
>>>>>>>>>>> I would suggest checking which of these OSDs actually have the
>>>>>>>>>>> data
>>>>>>>>>>> for this pg. From the pg query it looks like 2, 35 and 68 and
>>>>>>>>>>> possibly
>>>>>>>>>>> 28 since it's the primary. Check all OSDs in the pg query output.
>>>>>>>>>>> I
>>>>>>>>>>> would then back up all copies and work out which copy, if any,
>>>>>>>>>>> you
>>>>>>>>>>> want to keep and then attempt something like the following.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17820.html
>>>>>>>>>>>
>>>>>>>>>>> If you want to abandon the pg see
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html
>>>>>>>>>>> for a possible solution.
>>>>>>>>>>>
>>>>>>>>>>> http://ceph.com/community/incomplete-pgs-oh-my/ may also give
>>>>>>>>>>> some
>>>>>>>>>>> ideas.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai
>>>>>>>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The OSDs are all there.
>>>>>>>>>>>>
>>>>>>>>>>>> $ sudo ceph osd stat
>>>>>>>>>>>>      osdmap e60609: 72 osds: 72 up, 72 in
>>>>>>>>>>>>
>>>>>>>>>>>> an I have attached the result of ceph osd tree, and ceph osd
>>>>>>>>>>>> dump
>>>>>>>>>>>> commands.
>>>>>>>>>>>> I got some extra info about the network problem. A faulty
>>>>>>>>>>>> network
>>>>>>>>>>>> device
>>>>>>>>>>>> has
>>>>>>>>>>>> flooded the network eating up all the bandwidth so the OSDs were
>>>>>>>>>>>> not
>>>>>>>>>>>> able
>>>>>>>>>>>> to
>>>>>>>>>>>> properly communicate with each other. This has lasted for almost
>>>>>>>>>>>> 1
>>>>>>>>>>>> day.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Laszlo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10.03.2017 12:19, Brad Hubbard wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> To me it looks like someone may have done an "rm" on these OSDs
>>>>>>>>>>>>> but
>>>>>>>>>>>>> not removed them from the crushmap. This does not happen
>>>>>>>>>>>>> automatically.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ?
>>>>>>>>>>>>> If
>>>>>>>>>>>>> so,
>>>>>>>>>>>>> paste the output.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without knowing what exactly happened here it may be difficult
>>>>>>>>>>>>> to
>>>>>>>>>>>>> work
>>>>>>>>>>>>> out how to proceed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to go clean the primary needs to communicate with
>>>>>>>>>>>>> multiple
>>>>>>>>>>>>> OSDs, some of which are marked DNE and seem to be
>>>>>>>>>>>>> uncontactable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This seems to be more than a network issue (unless the outage
>>>>>>>>>>>>> is
>>>>>>>>>>>>> still
>>>>>>>>>>>>> happening).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai
>>>>>>>>>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was informed that due to a networking issue the ceph cluster
>>>>>>>>>>>>>> network
>>>>>>>>>>>>>> was
>>>>>>>>>>>>>> affected. There was a huge packet loss, and network interfaces
>>>>>>>>>>>>>> were
>>>>>>>>>>>>>> flipping. That's all I got.
>>>>>>>>>>>>>> This outage has lasted a longer period of time. So I assume
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> OSD
>>>>>>>>>>>>>> may have been considered dead and the data from them has been
>>>>>>>>>>>>>> moved
>>>>>>>>>>>>>> away
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> other PGs (this is what ceph is supposed to do if I'm
>>>>>>>>>>>>>> correct).
>>>>>>>>>>>>>> Probably
>>>>>>>>>>>>>> that was the point when the listed PGs have appeared into the
>>>>>>>>>>>>>> picture.
>>>>>>>>>>>>>> From the query we can see this for one of those OSDs:
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>             "peer": "14",
>>>>>>>>>>>>>>             "pgid": "3.367",
>>>>>>>>>>>>>>             "last_update": "0'0",
>>>>>>>>>>>>>>             "last_complete": "0'0",
>>>>>>>>>>>>>>             "log_tail": "0'0",
>>>>>>>>>>>>>>             "last_user_version": 0,
>>>>>>>>>>>>>>             "last_backfill": "MAX",
>>>>>>>>>>>>>>             "purged_snaps": "[]",
>>>>>>>>>>>>>>             "history": {
>>>>>>>>>>>>>>                 "epoch_created": 4,
>>>>>>>>>>>>>>                 "last_epoch_started": 54899,
>>>>>>>>>>>>>>                 "last_epoch_clean": 55143,
>>>>>>>>>>>>>>                 "last_epoch_split": 0,
>>>>>>>>>>>>>>                 "same_up_since": 60603,
>>>>>>>>>>>>>>                 "same_interval_since": 60603,
>>>>>>>>>>>>>>                 "same_primary_since": 60593,
>>>>>>>>>>>>>>                 "last_scrub": "2852'33528",
>>>>>>>>>>>>>>                 "last_scrub_stamp": "2017-02-26
>>>>>>>>>>>>>> 02:36:55.210150",
>>>>>>>>>>>>>>                 "last_deep_scrub": "2852'16480",
>>>>>>>>>>>>>>                 "last_deep_scrub_stamp": "2017-02-21
>>>>>>>>>>>>>> 00:14:08.866448",
>>>>>>>>>>>>>>                 "last_clean_scrub_stamp": "2017-02-26
>>>>>>>>>>>>>> 02:36:55.210150"
>>>>>>>>>>>>>>             },
>>>>>>>>>>>>>>             "stats": {
>>>>>>>>>>>>>>                 "version": "0'0",
>>>>>>>>>>>>>>                 "reported_seq": "14",
>>>>>>>>>>>>>>                 "reported_epoch": "59779",
>>>>>>>>>>>>>>                 "state": "down+peering",
>>>>>>>>>>>>>>                 "last_fresh": "2017-02-27 16:30:16.230519",
>>>>>>>>>>>>>>                 "last_change": "2017-02-27 16:30:15.267995",
>>>>>>>>>>>>>>                 "last_active": "0.000000",
>>>>>>>>>>>>>>                 "last_peered": "0.000000",
>>>>>>>>>>>>>>                 "last_clean": "0.000000",
>>>>>>>>>>>>>>                 "last_became_active": "0.000000",
>>>>>>>>>>>>>>                 "last_became_peered": "0.000000",
>>>>>>>>>>>>>>                 "last_unstale": "2017-02-27 16:30:16.230519",
>>>>>>>>>>>>>>                 "last_undegraded": "2017-02-27
>>>>>>>>>>>>>> 16:30:16.230519",
>>>>>>>>>>>>>>                 "last_fullsized": "2017-02-27
>>>>>>>>>>>>>> 16:30:16.230519",
>>>>>>>>>>>>>>                 "mapping_epoch": 60601,
>>>>>>>>>>>>>>                 "log_start": "0'0",
>>>>>>>>>>>>>>                 "ondisk_log_start": "0'0",
>>>>>>>>>>>>>>                 "created": 4,
>>>>>>>>>>>>>>                 "last_epoch_clean": 55143,
>>>>>>>>>>>>>>                 "parent": "0.0",
>>>>>>>>>>>>>>                 "parent_split_bits": 0,
>>>>>>>>>>>>>>                 "last_scrub": "2852'33528",
>>>>>>>>>>>>>>                 "last_scrub_stamp": "2017-02-26
>>>>>>>>>>>>>> 02:36:55.210150",
>>>>>>>>>>>>>>                 "last_deep_scrub": "2852'16480",
>>>>>>>>>>>>>>                 "last_deep_scrub_stamp": "2017-02-21
>>>>>>>>>>>>>> 00:14:08.866448",
>>>>>>>>>>>>>>                 "last_clean_scrub_stamp": "2017-02-26
>>>>>>>>>>>>>> 02:36:55.210150",
>>>>>>>>>>>>>>                 "log_size": 0,
>>>>>>>>>>>>>>                 "ondisk_log_size": 0,
>>>>>>>>>>>>>>                 "stats_invalid": "0",
>>>>>>>>>>>>>>                 "stat_sum": {
>>>>>>>>>>>>>>                     "num_bytes": 0,
>>>>>>>>>>>>>>                     "num_objects": 0,
>>>>>>>>>>>>>>                     "num_object_clones": 0,
>>>>>>>>>>>>>>                     "num_object_copies": 0,
>>>>>>>>>>>>>>                     "num_objects_missing_on_primary": 0,
>>>>>>>>>>>>>>                     "num_objects_degraded": 0,
>>>>>>>>>>>>>>                     "num_objects_misplaced": 0,
>>>>>>>>>>>>>>                     "num_objects_unfound": 0,
>>>>>>>>>>>>>>                     "num_objects_dirty": 0,
>>>>>>>>>>>>>>                     "num_whiteouts": 0,
>>>>>>>>>>>>>>                     "num_read": 0,
>>>>>>>>>>>>>>                     "num_read_kb": 0,
>>>>>>>>>>>>>>                     "num_write": 0,
>>>>>>>>>>>>>>                     "num_write_kb": 0,
>>>>>>>>>>>>>>                     "num_scrub_errors": 0,
>>>>>>>>>>>>>>                     "num_shallow_scrub_errors": 0,
>>>>>>>>>>>>>>                     "num_deep_scrub_errors": 0,
>>>>>>>>>>>>>>                     "num_objects_recovered": 0,
>>>>>>>>>>>>>>                     "num_bytes_recovered": 0,
>>>>>>>>>>>>>>                     "num_keys_recovered": 0,
>>>>>>>>>>>>>>                     "num_objects_omap": 0,
>>>>>>>>>>>>>>                     "num_objects_hit_set_archive": 0,
>>>>>>>>>>>>>>                     "num_bytes_hit_set_archive": 0
>>>>>>>>>>>>>>                 },
>>>>>>>>>>>>>>                 "up": [
>>>>>>>>>>>>>>                     28,
>>>>>>>>>>>>>>                     35,
>>>>>>>>>>>>>>                     2
>>>>>>>>>>>>>>                 ],
>>>>>>>>>>>>>>                 "acting": [
>>>>>>>>>>>>>>                     28,
>>>>>>>>>>>>>>                     35,
>>>>>>>>>>>>>>                     2
>>>>>>>>>>>>>>                 ],
>>>>>>>>>>>>>>                 "blocked_by": [],
>>>>>>>>>>>>>>                 "up_primary": 28,
>>>>>>>>>>>>>>                 "acting_primary": 28
>>>>>>>>>>>>>>             },
>>>>>>>>>>>>>>             "empty": 1,
>>>>>>>>>>>>>>             "dne": 0,
>>>>>>>>>>>>>>             "incomplete": 0,
>>>>>>>>>>>>>>             "last_epoch_started": 0,
>>>>>>>>>>>>>>             "hit_set_history": {
>>>>>>>>>>>>>>                 "current_last_update": "0'0",
>>>>>>>>>>>>>>                 "current_last_stamp": "0.000000",
>>>>>>>>>>>>>>                 "current_info": {
>>>>>>>>>>>>>>                     "begin": "0.000000",
>>>>>>>>>>>>>>                     "end": "0.000000",
>>>>>>>>>>>>>>                     "version": "0'0",
>>>>>>>>>>>>>>                     "using_gmt": "1"
>>>>>>>>>>>>>>                 },
>>>>>>>>>>>>>>                 "history": []
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>         },
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Where can I read more about the meaning of each parameter,
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> them
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> quite self explanatory names, but not all (or probably we need
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> deeper
>>>>>>>>>>>>>> knowledge to understand them).
>>>>>>>>>>>>>> Isn't there any parameter that would say when was that OSD
>>>>>>>>>>>>>> assigned
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> given PG? Also the stat_sum shows 0 for all its parameters.
>>>>>>>>>>>>>> Why
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> blocking then?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a way to tell the PG to forget about that OSD?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>> Laszlo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10.03.2017 03:05, Brad Hubbard wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you explain more about what happened?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The query shows progress is blocked by the following OSDs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                 "blocked_by": [
>>>>>>>>>>>>>>>                     14,
>>>>>>>>>>>>>>>                     17,
>>>>>>>>>>>>>>>                     51,
>>>>>>>>>>>>>>>                     58,
>>>>>>>>>>>>>>>                     63,
>>>>>>>>>>>>>>>                     64,
>>>>>>>>>>>>>>>                     68,
>>>>>>>>>>>>>>>                     70
>>>>>>>>>>>>>>>                 ],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Some of these OSDs are marked as "dne" (Does Not Exist).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> peer": "17",
>>>>>>>>>>>>>>> "dne": 1,
>>>>>>>>>>>>>>> "peer": "51",
>>>>>>>>>>>>>>> "dne": 1,
>>>>>>>>>>>>>>> "peer": "58",
>>>>>>>>>>>>>>> "dne": 1,
>>>>>>>>>>>>>>> "peer": "64",
>>>>>>>>>>>>>>> "dne": 1,
>>>>>>>>>>>>>>> "peer": "70",
>>>>>>>>>>>>>>> "dne": 1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can we get a complete background here please?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Mar 9, 2017 at 10:53 PM, Laszlo Budai
>>>>>>>>>>>>>>> <laszlo@xxxxxxxxxxxxxxxx>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After a major network outage our ceph cluster ended up with
>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>> inactive
>>>>>>>>>>>>>>>> PG:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # ceph health detail
>>>>>>>>>>>>>>>> HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs
>>>>>>>>>>>>>>>> stuck
>>>>>>>>>>>>>>>> unclean;
>>>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>>>> requests are blocked > 32 sec; 1 osds have slow requests
>>>>>>>>>>>>>>>> pg 3.367 is stuck inactive for 912263.766607, current state
>>>>>>>>>>>>>>>> incomplete,
>>>>>>>>>>>>>>>> last
>>>>>>>>>>>>>>>> acting [28,35,2]
>>>>>>>>>>>>>>>> pg 3.367 is stuck unclean for 912263.766688, current state
>>>>>>>>>>>>>>>> incomplete,
>>>>>>>>>>>>>>>> last
>>>>>>>>>>>>>>>> acting [28,35,2]
>>>>>>>>>>>>>>>> pg 3.367 is incomplete, acting [28,35,2]
>>>>>>>>>>>>>>>> 1 ops are blocked > 268435 sec
>>>>>>>>>>>>>>>> 1 ops are blocked > 268435 sec on osd.28
>>>>>>>>>>>>>>>> 1 osds have slow requests
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # ceph -s
>>>>>>>>>>>>>>>>     cluster 6713d1b8-83da-11e6-aa79-525400d98c5a
>>>>>>>>>>>>>>>>      health HEALTH_WARN
>>>>>>>>>>>>>>>>             1 pgs incomplete
>>>>>>>>>>>>>>>>             1 pgs stuck inactive
>>>>>>>>>>>>>>>>             1 pgs stuck unclean
>>>>>>>>>>>>>>>>             1 requests are blocked > 32 sec
>>>>>>>>>>>>>>>>      monmap e3: 3 mons at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0}
>>>>>>>>>>>>>>>>             election epoch 72, quorum 0,1,2
>>>>>>>>>>>>>>>> tv-dl360-1,tv-dl360-2,tv-dl360-3
>>>>>>>>>>>>>>>>      osdmap e60609: 72 osds: 72 up, 72 in
>>>>>>>>>>>>>>>>       pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778
>>>>>>>>>>>>>>>> objects
>>>>>>>>>>>>>>>>             490 GB used, 130 TB / 130 TB avail
>>>>>>>>>>>>>>>>                 4863 active+clean
>>>>>>>>>>>>>>>>                    1 incomplete
>>>>>>>>>>>>>>>>   client io 0 B/s rd, 38465 B/s wr, 2 op/s
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ceph pg repair doesn't change anything. What should I try to
>>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>>> it?
>>>>>>>>>>>>>>>> Attached is the result of ceph pg query on the problem PG.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>>> Laszlo
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> ceph-users mailing list
>>>>>>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com