Re: 10.2.4 Jewel released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 7, 2016 at 2:58 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> Actually, Greg and Sage are working up other branches, nvm.
> -Sam
>
> On Wed, Dec 7, 2016 at 2:52 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> I just pushed a branch wip-14120-10.2.4 with a possible fix.
>>
>> https://github.com/ceph/ceph/pull/12349/ is a fix for a known bug
>> which didn't quite make it into 10.2.4, it's possible that
>> 165e5abdbf6311974d4001e43982b83d06f9e0cc which did made the bug much
>> more likely to happen.  wip-14120-10.2.4 has that fix cherry-picked on
>> top of 10.2.4.  Can you try it and let us know the result?
>> -Sam

Sam's explanation is correct given what we have so far. You should use
wip-msgr-jewel-fix to try the backport fix though (freshly-pushed by
Sage so it will be about an hour before it's available to install).

In slightly more detail: you are clearly seeing a problem with the
messenger, as indicated by the sock_recvmsg at the top of the CPU
usage list. We've seen this elsewhere very rarely, which is why
there's already a backport queued up which we didn't block on.
The 15-minute period you're seeing is the default timeout we set on
sockets before we start marking them closed if there's no activity.

We're not quite sure why it's causing trouble now, although we have
one or two patches we are speculating about and looking into.

This didn't turn up in testing because as best we can tell it's only a
situation you can expect to encounter when you have idle TCP
connections between systems (or in fairly artificial failed
networking).

On Wed, Dec 7, 2016 at 3:02 PM, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote:
> On Wed, Dec 7, 2016 at 11:58 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> Actually, Greg and Sage are working up other branches, nvm.
>> -Sam
>
> Ok, I'll hold. If the issue is in the SimpleMessenger, would it be
> safe to switch to ms type = async as a workaround?
> I heard that it will become the default in Kraken, but how stable is
> it in Jewel?

Nobody awake right now has any certainty about the state of backports
— so no, don't do that. :(
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux