Re: Jewel to Kraken OSD upgrade issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof <bmeekhof@xxxxxxxxx> wrote:
> I tried starting up just a couple OSD with debug_osd = 20 and
> debug_filestore = 20.
>
> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
> unusual but maybe someone else sees something in here that is a
> problem:  http://pastebin.com/uy8S7hps
>
> As this log is rolling on, our OSD has still not been marked up and is
> occupying 100% of a CPU core.  I've done this a couple times and in a
> matter of some hours it will be marked up and CPU will drop.  If more
> kraken OSD on another host are brought up the existing kraken OSD go
> back into max CPU usage again while pg recover.  The trend scales
> upward as OSD are started until the system is completely saturated.
>
> I was reading the docs on async messenger settings at
> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
> that under 'ms async max op threads' there is a note about one or more
> CPUs constantly on 100% load.  As an experiment I set max op threads
> to 20 and that is the setting during the period of the pasted log.  It
> seems to make no difference.
>
> Appreciate any thoughts on troubleshooting this.  For the time being
> I've aborted our kraken update and will probably re-initialize any
> already updated OSD to revert to Jewel except perhaps one host to
> continue testing.

Ah, that log looks like you're just generating OSDMaps so quickly that
rebooting 60 at a time leaves you with a ludicrous number to churn
through, and that takes a while. It would have been exacerbated by
having 60 daemons fight for the CPU to process them, leading to
flapping.

You might try restarting daemons sequentially on the node instead of
all at once. Depending on your needs it would be even cheaper if you
set the nodown flag, though obviously that will impede IO while it
happens.

I'd be concerned that this demonstrates you don't have enough CPU
power per daemon, though.
-Greg

>
> thanks,
> Ben
>
> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof <bmeekhof@xxxxxxxxx> wrote:
>>> Hi all,
>>>
>>> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
>>> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
>>> mons to Kraken.
>>>
>>> After updating ceph packages I restarted the 60 OSD on the box with
>>> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
>>> load flat-lines at 100% with top showing all of that being system load
>>> from ceph-osd processes.  Not long after we get OSD flapping due to
>>> the load on the system (noout was set to start this, but perhaps
>>> too-quickly unset post restart).
>>>
>>> This is causing problems in the cluster, and we reboot the box.  The
>>> OSD don't start up/mount automatically - not a new problem on this
>>> setup.  We run 'ceph-disk activate $disk' on a list of all the
>>> /dev/dm-X devices as output by ceph-disk list.  Everything activates
>>> and the CPU gradually climbs to once again be a solid 100%.  No OSD
>>> have joined cluster so it isn't causing issues.
>>>
>>> I leave the box overnight...by the time I leave I see that 1-2 OSD on
>>> this box are marked up/in.   By morning all are in, CPU is fine,
>>> cluster is still fine.
>>>
>>> This is not a show-stopping issue now that I know what happens though
>>> it means upgrades are a several hour or overnight affair.  Next box I
>>> will just mark all the OSD out before updating and restarting them or
>>> try leaving them up but being sure to set noout to avoid flapping
>>> while they churn.
>>>
>>> Here's a log snippet from one currently spinning in the startup
>>> process since 11am.  This is the second box we did, the first
>>> experience being as detailed above.  Could this have anything to do
>>> with the 'PGs are upgrading' message?
>>
>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>> scale with the object count. I could be missing something but I don't
>> see anything in the upgrade notes that should be doing this either.
>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>> 20" set and see what the log spits out.
>> -Greg
>>
>>>
>>> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load lua
>>> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514119978713088, adjusting msgr requires for clients
>>> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032 was 8705, adjusting msgr requires for
>>> mons
>>> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032, adjusting msgr requires for osds
>>> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
>>> 2017-02-14 11:04:09.522249 7fd7a0372940 -1 osd.585 135493 PGs are upgrading
>>> 2017-02-14 11:04:10.246166 7fd7a0372940  0 osd.585 135493 load_pgs
>>> opened 148 pgs
>>> 2017-02-14 11:04:10.246249 7fd7a0372940  0 osd.585 135493 using 1 op
>>> queue with priority op cut off at 64.
>>> 2017-02-14 11:04:10.256299 7fd7a0372940 -1 osd.585 135493
>>> log_to_monitors {default=true}
>>> 2017-02-14 11:04:12.473450 7fd7a0372940  0 osd.585 135493 done with
>>> init, starting boot process
>>> (logs stop here, cpu spinning)
>>>
>>>
>>> regards,
>>> Ben
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux