On Mon, Jan 25, 2016 at 4:20 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Mon, Jan 25, 2016 at 7:03 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> On Mon, Jan 25, 2016 at 3:45 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >>> On Mon, Jan 25, 2016 at 5:14 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>> Hi Greg, >>>> >>>> With 794c86fd289b ("monc: backoff the timeout period when >>>> reconnecting") you made it so that the backoff is applied to the hunt >>>> interval. When the session is established, the multiplier is reduced >>>> by 50% and that's it - I don't see any per-tick reduction or anything >>>> like that. >>>> >>>> If a client had some bad luck and couldn't establish the session for >>>> a while (so that the multiplier went all the way up to 10), its initial >>>> timeout upon the next connection break is going to be 15 seconds no >>>> matter how much time has passed in the interim. Was that your intent? >>> >>> I don't remember this, but looking at the sha I logged that behavior >>> in the commit message, so I'd have to say "yes". As it says, we're >>> trying to respond to monitor load; if they're doing so badly that we >>> had to increase our timeout when re-establishing a session, there's >>> every chance it will continue to be slow. If we reset the timeout back >>> to default, we'd have to go through a lot more monitor-punishing >>> timeout rounds on the next failure than just cutting it in half would >>> take. >> >> The timeout could have been increased due to intermittent networking >> issues between the client and the monitor cluster. The problem I see >> here is that once it's increased to 30s, it's effectively never >> decreased - since it's cut in half only once, that MonClient instance >> is stuck with 15s as its initial timeout forever. >> >> I'm not advocating resetting it back to default right away, it's just >> I expected to see some kind of slow backoff back to default. > > Mmm, that might make sense. There's just also a limit to how much this > is worth worrying about — longer timeouts are bad only in the presence > of actually-dead monitors, and only when your connection to one of the > monitors dies. Any sort of gradual decay here would require more > complicated state and some mechanism for determining the monitors have > gotten happy now. Maybe you could feed it in based on response times > of other requests... Well, a *really* slow decay might not need to check for whether the monitors are happy or not and so won't require any additional state. Anyway, I'm not super worried about this either - I'm bringing it into the kernel client and just wanted to make sure it behaves as intended before I merge it in. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html