在 2018年4月3日,下午9:20,Sage Weil <sweil@xxxxxxxxxx> 写道: > > On Tue, 3 Apr 2018, cgxu519@xxxxxxx wrote: >>> 在 2018年4月3日,上午1:56,Gregory Farnum <gfarnum@xxxxxxxxxx> 写道: >>> >>> On Fri, Mar 30, 2018 at 11:47 PM, xiangyang yu <penglaiyxy@xxxxxxxxx> wrote: >>>> Hi cephers, >>>> >>>> Recently there has been a big problem in our production ceph >>>> cluster.It has been running very well for one and a half years. >>>> >>>> RBD client network and ceph public network are different, >>>> communicating through a router. >>>> >>>> Our ceph version is 0.94.5. Our IO transport is using Simple Messanger. >>>> >>>> Yesterday some of our VM (using qemu librbd) can not send IO to ceph cluster. >>>> >>>> Ceph status is healthy and no osd up/down and no pg inactive and down. >>>> >>>> When we export an rbd image through rbd export ,we find the rbd client >>>> can not connect to one osd just to say osd.34. >>>> >>>> We find thant osd.34 up and running ,but in the log we find some >>>> errors as follows: >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> accepter no incoming connection? sd =-1 ,errer 24, too many open files. >>>> >>>> We find that our max open files is set to 200000, but filestore fd >>>> cache size is too big like 500000. >>>> I think we have some wrong fd configurations.But when there are some >>>> errors in Accepter::entry() ,it's better to assert the osd process so >>>> that new rbd client can connect to the ceph cluster and when there >>>> are some network probem, the old rbd client can also reconnect to the >>>> cluster. >>> >>> If we asserted here, the OSD would just go into an assert loop as it >>> rebooted, all the clients reconnected, and then they ran into its fd >>> limit again. >> >> Could we add accepter thread into heartbeat monitoring target for osd? >> If accepter stops working then set timeout for now, so the osd will be >> marked as unhealthy via heartbeat checking. I know it’s not a perfect >> solution but maybe a proper workaround for mitigating impact for the >> users who still use combination of simple messenger and filestore. > > As long as you can get the accept loop to return periodically even when > idle so that the heartbeat can be updated, then that sounds like it would > work! I think that means you'll need to switch to select(2) (or some > variant). My below proposal is maybe tricky but simpler. When we add accepter to heartbeat map, the initial timeout/suicide_timeout will be set to 0, if accepter works well we do not touch timeout/suicide_timeout so that it can escape from heartbeat check and always be detected as healthy, if accepter stop working, we set timeout to current time (or current time + 1s), this time heartbeat will detect accepter unhealthy. Is it still acceptable? Thanks, Chengguang. >>> >>> Unfortunately there isn't much we can do about it. This is a >>> fundamental thing with Linux fd limits and networked services; you >>> just need to tune it correctly. :( >>> >>> It does become less of a problem in later versions with BlueStore >>> (which doesn't use fds) and AsyncMessenger (which uses just as many >>> sockets, but fewer threads). >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html