Regards, Zhi Zhang (David) Contact: zhang.david2011@xxxxxxxxx zhangz.david@xxxxxxxxxxx ---------- Forwarded message ---------- From: Jaze Lee <jazeltq@xxxxxxxxx> Date: Mon, Dec 21, 2015 at 4:08 PM Subject: Re: Client still connect failed leader after that mon down To: Zhi Zhang <zhang.david2011@xxxxxxxxx> Hello, I am terrible sorry. I think we may not need to reconstruct the monclient.{h,cc}, we find the parameter is mon_client_hunt_interval is very usefull. When we set mon_client_hunt_interval = 0.5, the time to run a ceph command is very small even it first connects the down leader mon. The first time i ask the question was because we find the parameter from official site http://docs.ceph.com/docs/master/rados/configuration/mon-config-ref/. It is write in this mon client hung interval Description:The client will try a new monitor every N seconds until it establishes a connection. Type:Double Default:3.0 And we set it. it is not work. I think may be it is a slip of pen? The right configuration parameter should be mon client hunt interval Can someone please help me to fix this in official site? Thanks a lot. 2015-12-21 14:00 GMT+08:00 Jaze Lee <jazeltq@xxxxxxxxx>: > right now we use simple msg, and cpeh version is 0.80... > > 2015-12-21 10:55 GMT+08:00 Zhi Zhang <zhang.david2011@xxxxxxxxx>: >> Which msg type and ceph version are you using? >> >> Once we used 0.94.1 with async msg, we encountered similar issue. >> Client was trying to connect a down monitor when it was just started >> and this connection would hung there. This is because previous async >> msg used blocking connection mode. >> >> After we back ported non-blocking mode of async msg from higher ceph >> version, we haven't encountered such issue yet. >> >> >> Regards, >> Zhi Zhang (David) >> Contact: zhang.david2011@xxxxxxxxx >> zhangz.david@xxxxxxxxxxx >> >> >> On Fri, Dec 18, 2015 at 11:41 AM, Jevon Qiao <scaleqiao@xxxxxxxxx> wrote: >>> On 17/12/15 21:27, Sage Weil wrote: >>>> >>>> On Thu, 17 Dec 2015, Jaze Lee wrote: >>>>> >>>>> Hello cephers: >>>>> In our test, there are three monitors. We find client run ceph >>>>> command will slow when the leader mon is down. Even after long time, a >>>>> client run ceph command will also slow in first time. >>>>> >From strace, we find that the client first to connect the leader, then >>>>> after 3s, it connect the second. >>>>> After some search we find that the quorum is not change, the leader is >>>>> still the down monitor. >>>>> Is that normal? Or is there something i miss? >>>> >>>> It's normal. Even when the quorum does change, the client doesn't >>>> know that. It should be contacting a random mon on startup, though, so I >>>> would expect the 3s delay 1/3 of the time. >>> >>> That's because client randomly picks up a mon from Monmap. But what we >>> observed is that when a mon is down no change is made to monmap(neither the >>> epoch nor the members). Is it the culprit for this phenomenon? >>> >>> Thanks, >>> Jevon >>> >>>> A long-standing low-priority feature request is to have the client contact >>>> 2 mons in parallel so that it can still connect quickly if one is down. >>>> It's requires some non-trivial work in mon/MonClient.{cc,h} though and I >>>> don't think anyone has looked at it seriously. >>>> >>>> sage >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > 谦谦君子 -- 谦谦君子 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html