Re: [PATCH] msgr: Correctly handle half-open connections.

"Jim Schutt" <jaschut@xxxxxxxxxx> · Fri, 3 Dec 2010 10:18:36 -0700

On Fri, 2010-12-03 at 09:59 -0700, Gregory Farnum wrote:
> On Fri, Dec 3, 2010 at 8:48 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
> > I still see lots of clients resetting osds, but it has no
> > ill effects now.
> This at least is expected -- we realized a few months back that
> connections were never being removed from the OSD if the client
> crashed (didn't send a FIN notification) and had to implement
> timeouts. Having reasonably robust failure handling on each end meant
> we didn't need to do anything clever with keepalives, so we just left
> it. :)

Sure.  I only mention it because it suggests that 
when the osds are overloaded and causing the resets,
a little extra work is being done to handle them.

> 
> Separately,
> > This combination has survived the heaviest write loads
> > (64 clients against 13 osds) that I've tested with to date.
> How is this scaling for you on the client side? We're starting to do
> more large-scale testing but haven't gotten through much yet!

Well, I haven't been paying too much attention to performance 
yet, and my disks are old and slow (40 MB/s streaming write to
a raw block device), so it doesn't take very many clients
to saturate my osds.

However, I have noticed that aggregate throughput stays
about the same once I saturate the osds, no matter how
much load I add after that.

With my disks, 13 osds (1/server right now), 2 GiB journal 
partition, and replication level 2, I max out at 100-120 MB/s on
streaming writes from lots of clients.

I know that's not exactly what you were asking for, but
it's all I've got so far....

I'm about to start running with 16 osds/server.  
Stay tuned...

-- Jim

> -Greg
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html