Re: [Hamme-r][Simple Msg]Cluster can not work when Accepter::entry quit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think this problem is very serious in production environment.
Does someone have  ideas about this bug?

2018-03-31 6:47 GMT+00:00 xiangyang yu <penglaiyxy@xxxxxxxxx>:
> Hi cephers,
>
> Recently there has been a big problem in our production ceph
> cluster.It has been running very well for one and a half years.
>
> RBD client network and ceph public network are different,
> communicating through a router.
>
> Our ceph version is 0.94.5. Our IO transport is using Simple Messanger.
>
> Yesterday some of our VM (using qemu librbd) can not send IO to ceph cluster.
>
> Ceph status is healthy and no osd up/down and no pg inactive and down.
>
> When we export an rbd image through rbd export ,we find the rbd client
> can not connect to one osd just to say osd.34.
>
> We find thant osd.34 up and running ,but in the log we find some
> errors as follows:
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
> accepter no incoming connection?  sd =-1 ,errer 24, too many open files.
>
> We find that our max open files is set to 200000, but filestore fd
> cache size is too big like 500000.
> I think we have some wrong fd configurations.But when there are some
> errors in Accepter::entry() ,it's better to assert the osd process  so
> that new rbd client can connect to the ceph cluster  and when there
> are some network probem, the old rbd client can also reconnect to the
> cluster.
>
> I do not know if there has been some fixes in upper version.
>
> Best regards,
> brandy
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux