Re: Osd failure detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 9, 2017 at 7:03 PM, David Disseldorp <ddiss@xxxxxxx> wrote:
> On Thu, 9 Nov 2017 17:43:04 +0800, Wei Jin wrote:
>
>> Hi, List,
>>
>> From Luminous release, I noticed following information:
>>
>>  "Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire.  This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running."
>
> I assume you're referring to the ECONNREFUSED-fast-fail functionality
> added by Piotr Dałek.
>

Exactly.

>> This is critical and we have no plan to upgrade to Luminous so far.
>> Is there any plan to back port it Jewel? Or anybody know the related pr or patches? Maybe I could do it by myself.
>
> It was backported to Jewel, alongside a bunch of other async messenger
> fixes, and submitted via https://github.com/ceph/ceph/pull/13212 . IIRC,
> there's still a small async messenger leak blocking the PR.
>

Yeah. It is still open and marked as DNM.
And there are some issues with async messenger. As async is not the
default one, why not just make it available for simple messenger?

> Cheers, David
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux