Re: single-threaded seastar-osd

Sage Weil <sweil@xxxxxxxxxx> · Sat, 5 Jan 2019 18:42:50 +0000 (UTC)

On Sat, 5 Jan 2019, kefu chai wrote:
> - unable to share the osdmap cache. considering a high-density storage
> deployment, where more than 40 disks are squeezed into a host, if we
> are not able to reuse the osdmap cache. that's a shame..

I think this is a bit of a red herring.  We could confine each OSD to a 
single core, each with its own distinct messenger, with no cross-core 
context switches in the IO path, but still put all (or many) OSDs inside 
the same process with a shared OSDMap cache.  The semantics of sharing the 
cache are comparatively simple, with a immutable maps that are only 
occasionally added.

> - unable to share the connection to peer OSDs, mon and mgr.
>   probably it's not a big deal in comparison to existing non
> co-located OSD, but if we compare it with the co-located OSD, well,
> you'll see what we will be missing.

The big question for me here is whether we would *want* to share 
connections (even if we could).  My sense is that we'll get better 
performance if we don't, both due to the cross-core traffic inside the 
OSD(s), and also in the network itself.  

Is there a risk that having lots of TCP connections in the network is 
going to be more of a problem for the network itself?  (Something in the 
switches?)  I think the usual concerns with many TCP connections are 
around scalability of the in-kernel networks stack, but if each OSD has 
it's own local TCP+IP stack in userspace, then I suspect that goes away?

sage