Re: [Hamme-r][Simple Msg]Cluster can not work when Accepter::entry quit

cgxu519@xxxxxxx · Wed, 4 Apr 2018 10:11:25 +0800

在 2018年4月3日，下午9:20，Sage Weil <sweil@xxxxxxxxxx> 写道：
> 
> On Tue, 3 Apr 2018, cgxu519@xxxxxxx wrote:
>>> 在 2018年4月3日，上午1:56，Gregory Farnum <gfarnum@xxxxxxxxxx> 写道：
>>> 
>>> On Fri, Mar 30, 2018 at 11:47 PM, xiangyang yu <penglaiyxy@xxxxxxxxx> wrote:
>>>> Hi cephers,
>>>> 
>>>> Recently there has been a big problem in our production ceph
>>>> cluster.It has been running very well for one and a half years.
>>>> 
>>>> RBD client network and ceph public network are different,
>>>> communicating through a router.
>>>> 
>>>> Our ceph version is 0.94.5. Our IO transport is using Simple Messanger.
>>>> 
>>>> Yesterday some of our VM (using qemu librbd) can not send IO to ceph cluster.
>>>> 
>>>> Ceph status is healthy and no osd up/down and no pg inactive and down.
>>>> 
>>>> When we export an rbd image through rbd export ,we find the rbd client
>>>> can not connect to one osd just to say osd.34.
>>>> 
>>>> We find thant osd.34 up and running ,but in the log we find some
>>>> errors as follows:
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>>>> 
>>>> We find that our max open files is set to 200000, but filestore fd
>>>> cache size is too big like 500000.
>>>> I think we have some wrong fd configurations.But when there are some
>>>> errors in Accepter::entry() ,it's better to assert the osd process  so
>>>> that new rbd client can connect to the ceph cluster  and when there
>>>> are some network probem, the old rbd client can also reconnect to the
>>>> cluster.
>>> 
>>> If we asserted here, the OSD would just go into an assert loop as it
>>> rebooted, all the clients reconnected, and then they ran into its fd
>>> limit again.
>> 
>> Could we add accepter thread into heartbeat monitoring target for osd?
>> If accepter stops working then set timeout for now, so the osd will be
>> marked as unhealthy via heartbeat checking. I know it’s not a perfect
>> solution but maybe a proper workaround for mitigating impact for the
>> users who still use combination of simple messenger and filestore.
> 
> As long as you can get the accept loop to return periodically even when 
> idle so that the heartbeat can be updated, then that sounds like it would 
> work!  I think that means you'll need to switch to select(2) (or some 
> variant).

My below proposal is maybe tricky but simpler. When we add accepter to
heartbeat map, the initial timeout/suicide_timeout will be set to 0,
if accepter works well we do not touch timeout/suicide_timeout so that
it can escape from heartbeat check and always be detected as healthy,
if accepter stop working, we set timeout to current time
(or current time + 1s), this time heartbeat will detect accepter unhealthy.
Is it still acceptable?

Thanks，
Chengguang.

>>> 
>>> Unfortunately there isn't much we can do about it. This is a
>>> fundamental thing with Linux fd limits and networked services; you
>>> just need to tune it correctly. :(
>>> 
>>> It does become less of a problem in later versions with BlueStore
>>> (which doesn't use fds) and AsyncMessenger (which uses just as many
>>> sockets, but fewer threads).
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html