Re: [PATCH] msgr: Correctly handle half-open connections.

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 3 Dec 2010 09:33:38 -0800 (PST)

On Fri, 3 Dec 2010, Jim Schutt wrote:
> On Fri, 2010-12-03 at 09:59 -0700, Gregory Farnum wrote:
> > On Fri, Dec 3, 2010 at 8:48 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
> > > I still see lots of clients resetting osds, but it has no
> > > ill effects now.
> > This at least is expected -- we realized a few months back that
> > connections were never being removed from the OSD if the client
> > crashed (didn't send a FIN notification) and had to implement
> > timeouts. Having reasonably robust failure handling on each end meant
> > we didn't need to do anything clever with keepalives, so we just left
> > it. :)
> 
> Sure.  I only mention it because it suggests that 
> when the osds are overloaded and causing the resets,
> a little extra work is being done to handle them.

The timeouts can be disabled by mounting with '-o osdtimeout=0'.  It is 
really a bandaid to recover from OSD problems; in theory, with non-buggy 
functional osd clients, daemons, and msgr, they shouldn't be necessary.  
(Notably, the userspace osd client does not implement timeouts.)

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html