Re: Jewel to Kraken OSD upgrade issues

Benjeman Meekhof <bmeekhof@xxxxxxxxx> · Thu, 16 Feb 2017 17:13:02 -0500

Sure, looks as follows:

ceph -s
    cluster 24b3bf92-299d-426c-ae56-48a995014f04
     health HEALTH_ERR
            775 pgs are stuck inactive for more than 300 seconds
            1954 pgs backfill_wait
            7764 pgs degraded
            517 pgs down
            3 pgs inconsistent
            504 pgs peering
            31 pgs recovering
            5661 pgs recovery_wait
            6976 pgs stuck degraded
            775 pgs stuck inactive
            8362 pgs stuck unclean
            1851 pgs stuck undersized
            1880 pgs undersized
            110 requests are blocked > 32 sec
            recovery 2788277/17810399 objects degraded (15.655%)
            recovery 1846569/17810399 objects misplaced (10.368%)
            recovery 11442/5635366 unfound (0.203%)
            76 scrub errors
     monmap e4: 3 mons at
{msu-mon01=207.73.217.13:6789/0,um-mon01=141.211.169.13:6789/0,wsu-mon01=204.39.195.13:6789/0}
            election epoch 23402, quorum 0,1,2 um-mon01,wsu-mon01,msu-mon01
      fsmap e1074: 1/1/1 up {0=wsu-mds01=up:active}, 1
up:standby-replay, 1 up:standby
        mgr active: um-mon01
     osdmap e152705: 627 osds: 475 up, 475 in; 2434 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v13524534: 20864 pgs, 25 pools, 21246 GB data, 5503 kobjects
            59604 GB used, 3397 TB / 3455 TB avail
            2788277/17810399 objects degraded (15.655%)
            1846569/17810399 objects misplaced (10.368%)
            11442/5635366 unfound (0.203%)
               12501 active+clean
                5638 active+recovery_wait+degraded
                1617 active+undersized+degraded+remapped+backfill_wait
                 436 down+remapped+peering
                 258 undersized+degraded+remapped+backfill_wait+peered
                 189 active+degraded
                  79 active+remapped+backfill_wait
                  68 down+peering
                  31 active+recovering+degraded
                  20 active+recovery_wait+degraded+remapped
                  10 down
                   4 active+degraded+remapped
                   3 down+remapped
                   3 active+undersized+degraded+remapped
                   2 active+recovery_wait+undersized+degraded+remapped
                   2 active+remapped
                   1 active+clean+inconsistent
                   1 active+recovery_wait+degraded+inconsistent
                   1 active+degraded+inconsistent

On Thu, Feb 16, 2017 at 5:08 PM, Shinobu Kinjo <skinjo@xxxxxxxxxx> wrote:
> Would you simply do?
>
>  * ceph -s
>
> On Fri, Feb 17, 2017 at 6:26 AM, Benjeman Meekhof <bmeekhof@xxxxxxxxx> wrote:
>> As I'm looking at logs on the OSD mentioned in previous email at this
>> point, I mostly see this message repeating...is this normal or
>> indicating a problem?  This osd is marked up in the cluster.
>>
>> 2017-02-16 16:23:35.550102 7fc66fce3700 20 osd.564 152609
>> share_map_peer 0x7fc6887a3000 already has epoch 152609
>> 2017-02-16 16:23:35.556208 7fc66f4e2700 20 osd.564 152609
>> share_map_peer 0x7fc689e35000 already has epoch 152609
>> 2017-02-16 16:23:35.556233 7fc66f4e2700 20 osd.564 152609
>> share_map_peer 0x7fc689e35000 already has epoch 152609
>> 2017-02-16 16:23:35.577324 7fc66fce3700 20 osd.564 152609
>> share_map_peer 0x7fc68f4c1000 already has epoch 152609
>> 2017-02-16 16:23:35.577356 7fc6704e4700 20 osd.564 152609
>> share_map_peer 0x7fc68f4c1000 already has epoch 152609
>>
>> thanks,
>> Ben
>>
>> On Thu, Feb 16, 2017 at 12:19 PM, Benjeman Meekhof <bmeekhof@xxxxxxxxx> wrote:
>>> I tried starting up just a couple OSD with debug_osd = 20 and
>>> debug_filestore = 20.
>>>
>>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>>> unusual but maybe someone else sees something in here that is a
>>> problem:  http://pastebin.com/uy8S7hps
>>>
>>> As this log is rolling on, our OSD has still not been marked up and is
>>> occupying 100% of a CPU core.  I've done this a couple times and in a
>>> matter of some hours it will be marked up and CPU will drop.  If more
>>> kraken OSD on another host are brought up the existing kraken OSD go
>>> back into max CPU usage again while pg recover.  The trend scales
>>> upward as OSD are started until the system is completely saturated.
>>>
>>> I was reading the docs on async messenger settings at
>>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>>> that under 'ms async max op threads' there is a note about one or more
>>> CPUs constantly on 100% load.  As an experiment I set max op threads
>>> to 20 and that is the setting during the period of the pasted log.  It
>>> seems to make no difference.
>>>
>>> Appreciate any thoughts on troubleshooting this.  For the time being
>>> I've aborted our kraken update and will probably re-initialize any
>>> already updated OSD to revert to Jewel except perhaps one host to
>>> continue testing.
>>>
>>> thanks,
>>> Ben
>>>
>>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>>>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof <bmeekhof@xxxxxxxxx> wrote:
>>>>> Hi all,
>>>>>
>>>>> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
>>>>> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
>>>>> mons to Kraken.
>>>>>
>>>>> After updating ceph packages I restarted the 60 OSD on the box with
>>>>> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
>>>>> load flat-lines at 100% with top showing all of that being system load
>>>>> from ceph-osd processes.  Not long after we get OSD flapping due to
>>>>> the load on the system (noout was set to start this, but perhaps
>>>>> too-quickly unset post restart).
>>>>>
>>>>> This is causing problems in the cluster, and we reboot the box.  The
>>>>> OSD don't start up/mount automatically - not a new problem on this
>>>>> setup.  We run 'ceph-disk activate $disk' on a list of all the
>>>>> /dev/dm-X devices as output by ceph-disk list.  Everything activates
>>>>> and the CPU gradually climbs to once again be a solid 100%.  No OSD
>>>>> have joined cluster so it isn't causing issues.
>>>>>
>>>>> I leave the box overnight...by the time I leave I see that 1-2 OSD on
>>>>> this box are marked up/in.   By morning all are in, CPU is fine,
>>>>> cluster is still fine.
>>>>>
>>>>> This is not a show-stopping issue now that I know what happens though
>>>>> it means upgrades are a several hour or overnight affair.  Next box I
>>>>> will just mark all the OSD out before updating and restarting them or
>>>>> try leaving them up but being sure to set noout to avoid flapping
>>>>> while they churn.
>>>>>
>>>>> Here's a log snippet from one currently spinning in the startup
>>>>> process since 11am.  This is the second box we did, the first
>>>>> experience being as detailed above.  Could this have anything to do
>>>>> with the 'PGs are upgrading' message?
>>>>
>>>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>>>> scale with the object count. I could be missing something but I don't
>>>> see anything in the upgrade notes that should be doing this either.
>>>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>>>> 20" set and see what the log spits out.
>>>> -Greg
>>>>
>>>>>
>>>>> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load lua
>>>>> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
>>>>> has features 288514119978713088, adjusting msgr requires for clients
>>>>> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
>>>>> has features 288514394856620032 was 8705, adjusting msgr requires for
>>>>> mons
>>>>> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
>>>>> has features 288514394856620032, adjusting msgr requires for osds
>>>>> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
>>>>> 2017-02-14 11:04:09.522249 7fd7a0372940 -1 osd.585 135493 PGs are upgrading
>>>>> 2017-02-14 11:04:10.246166 7fd7a0372940  0 osd.585 135493 load_pgs
>>>>> opened 148 pgs
>>>>> 2017-02-14 11:04:10.246249 7fd7a0372940  0 osd.585 135493 using 1 op
>>>>> queue with priority op cut off at 64.
>>>>> 2017-02-14 11:04:10.256299 7fd7a0372940 -1 osd.585 135493
>>>>> log_to_monitors {default=true}
>>>>> 2017-02-14 11:04:12.473450 7fd7a0372940  0 osd.585 135493 done with
>>>>> init, starting boot process
>>>>> (logs stop here, cpu spinning)
>>>>>
>>>>>
>>>>> regards,
>>>>> Ben
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com