On Fri, 3 Dec 2010, Jim Schutt wrote: > On Fri, 2010-12-03 at 09:59 -0700, Gregory Farnum wrote: > > On Fri, Dec 3, 2010 at 8:48 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote: > > > I still see lots of clients resetting osds, but it has no > > > ill effects now. > > This at least is expected -- we realized a few months back that > > connections were never being removed from the OSD if the client > > crashed (didn't send a FIN notification) and had to implement > > timeouts. Having reasonably robust failure handling on each end meant > > we didn't need to do anything clever with keepalives, so we just left > > it. :) > > Sure. I only mention it because it suggests that > when the osds are overloaded and causing the resets, > a little extra work is being done to handle them. The timeouts can be disabled by mounting with '-o osdtimeout=0'. It is really a bandaid to recover from OSD problems; in theory, with non-buggy functional osd clients, daemons, and msgr, they shouldn't be necessary. (Notably, the userspace osd client does not implement timeouts.) sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html