Re: Failing OSDs (suicide timeout) due to flaky clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Uh, searching for OpTracker in my github emails leads me to
https://github.com/ceph/ceph/pull/7148

I didn't try and trace the backports but there should be links from
the referenced Redmine ticket, or you can search the git logs.
-Greg

On Tue, Jul 5, 2016 at 11:32 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>> Op 5 juli 2016 om 19:48 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
>>
>>
>> On Tue, Jul 5, 2016 at 10:45 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>> >
>> >> Op 5 juli 2016 om 19:27 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
>> >>
>> >>
>> >> On Tue, Jul 5, 2016 at 2:10 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>> >> >
>> >> >> Op 5 juli 2016 om 10:56 schreef huang jun <hjwsm1989@xxxxxxxxx>:
>> >> >>
>> >> >>
>> >> >> i see osd timed out many times.
>> >> >> In SimpleMessenger mode, when sending msg, the Pipeconnection will
>> >> >> hold a lock, which maybe hold by other threads,
>> >> >> it's reported before: http://tracker.ceph.com/issues/9921
>> >> >>
>> >> >
>> >> > Thank you! It surely looks like the same symptoms we are seeing in this cluster.
>> >> >
>> >> > The bug has been marked as resolved, but are you sure it is?
>> >>
>> >> Pretty sure about that bug being done.
>> >>
>> >> The conntrack filling thing sounds vaguely familiar though. Is this
>> >> the latest hammer? I think there were some leaks of messages while
>> >> sending replies that might have blocked up incoming queues that got
>> >> resolved later.
>> >
>> > Keep in mind, it's the conntrack filling up on the client which results in >50% packetloss on that client.
>> >
>> > The cluster is not firewalled and doesn't do any connection tracking.
>> >
>> > This is hammer 0.94.5, if this is fixed in .6 or .7, do you have an idea for which commit I should look? (Simple)Messenger related?
>>
>> If it is one of the op leaks, it'll be in the OSD OpTracker stuff to
>> avoid keeping around message references for tracking purposes and
>> unblocking the client Throttles.
>
> Thanks! I've been looking in the hammer and master branch, but was unable to find the right commit I think. Been looking for 45 minutes now, but nothing which caught my attention.
>
> If you have the time, would you be so kind to take a look?
>
> Wido
>
>> -Greg
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux