Re: OSDs not coming up on one host

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 16 Jun 2016 06:23:42 -0700



On Wed, Jun 15, 2016 at 10:21 AM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:
> Hello Jacob, Gregory,
>
> did you manage to start up those OSDs at last? I came across a very
> much alike incident [1] (no flags preventing the OSDs from getting UP
> in the cluster though, no hardware problems reported) and I wonder if
> you found out what was the culprit in your case.
>
> [1] http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/30432

Nope, never heard back. That said, it's not clear from your
description if these are actually the same problem; if they are you
need to provide monitor logs before anybody can help. If they aren't,
you are skipping steps and need to include OSD logs and things. ;)
-Greg

>
> Best regards,
> Kostis
>
> On 17 April 2015 at 02:04, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> The monitor looks like it's not generating a new OSDMap including the
>> booting OSDs. I could say with more certainty what's going on with the
>> monitor log file, but I'm betting you've got one of the noin or noup
>> family of flags set. I *think* these will be output in "ceph -w" or in
>> "ceph osd dump", although I can't say for certain in Firefly.
>> -Greg
>>
>> On Fri, Apr 10, 2015 at 1:57 AM, Jacob Reid <lists-ceph@xxxxxxxxxxxxxxxx> wrote:
>>> On Fri, Apr 10, 2015 at 09:55:20AM +0100, Jacob Reid wrote:
>>>> On Thu, Apr 09, 2015 at 05:21:47PM +0100, Jacob Reid wrote:
>>>> > On Thu, Apr 09, 2015 at 08:46:07AM -0700, Gregory Farnum wrote:
>>>> > > On Thu, Apr 9, 2015 at 8:14 AM, Jacob Reid <lists-ceph@xxxxxxxxxxxxxxxx> wrote:
>>>> > > > On Thu, Apr 09, 2015 at 06:43:45AM -0700, Gregory Farnum wrote:
>>>> > > >> You can turn up debugging ("debug osd = 10" and "debug filestore = 10"
>>>> > > >> are probably enough, or maybe 20 each) and see what comes out to get
>>>> > > >> more information about why the threads are stuck.
>>>> > > >>
>>>> > > >> But just from the log my answer is the same as before, and now I don't
>>>> > > >> trust that controller (or maybe its disks), regardless of what it's
>>>> > > >> admitting to. ;)
>>>> > > >> -Greg
>>>> > > >>
>>>> > > >
>>>> > > > Ran with osd and filestore debug both at 20; still nothing jumping out at me. Logfile attached as it got huge fairly quickly, but mostly seems to be the same extra lines. I tried running some test I/O on the drives in question to try and provoke some kind of problem, but they seem fine now...
>>>> > >
>>>> > > Okay, this is strange. Something very wonky is happening with your
>>>> > > scheduler — it looks like these threads are all idle, and they're
>>>> > > scheduling wakeups that handle an appreciable amount of time after
>>>> > > they're supposed to. For instance:
>>>> > > 2015-04-09 15:56:55.953116 7f70a7963700 20
>>>> > > filestore(/var/lib/ceph/osd/osd.15) sync_entry woke after 5.416704
>>>> > > 2015-04-09 15:56:55.953153 7f70a7963700 20
>>>> > > filestore(/var/lib/ceph/osd/osd.15) sync_entry waiting for
>>>> > > max_interval 5.000000
>>>> > >
>>>> > > This is the thread that syncs your backing store, and it always sets
>>>> > > itself to get woken up at 5-second intervals — but here it took >5.4
>>>> > > seconds, and later on in your log it takes more than 6 seconds.
>>>> > > It looks like all the threads which are getting timed out are also
>>>> > > idle, but are taking so much longer to wake up than they're set for
>>>> > > that they get a timeout warning.
>>>> > >
>>>> > > There might be some bugs in here where we're expecting wakeups to be
>>>> > > more precise than they can be, but these sorts of misses are
>>>> > > definitely not normal. Is this server overloaded on the CPU? Have you
>>>> > > done something to make the scheduler or wakeups wonky?
>>>> > > -Greg
>>>> >
>>>> > CPU load is minimal - the host does nothing but run OSDs and has 8 cores that are all sitting idle with a load average of 0.1. I haven't done anything to scheduling. That was with the debug logging on, if that could be the cause of any delays. A scheduler issue seems possible - I haven't done anything to it, but `time sleep 5` run a few times returns anything spread randomly from 5.002 to 7.1(!) seconds but mostly in the 5.5-6.0 region where it managed fairly consistently <5.2 on the other servers in the cluster and <5.02 on my desktop. I have disabled the CPU power saving mode as the only thing I could think of that might be having an effect on this, and running the same test again gives more sane results... we'll see if this reflects in the OSD logs or not, I guess. If this is the cause, it's probably something that the next version might want to make a specific warning case of detecting. I will keep you updated as to their behaviour now...
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@xxxxxxxxxxxxxx
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> Overnight, nothing changed - I am no longer seeing the timeout in the logs but all the OSDs in questions are still happily sitting at booting and showing as down in the tree. Debug 20 logfile attached again.
>>> ...and here actually *is* the logfile, which I managed to forget... must be Friday, I guess.
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com