On Sat, 5 Jan 2019, kefu chai wrote: > - unable to share the osdmap cache. considering a high-density storage > deployment, where more than 40 disks are squeezed into a host, if we > are not able to reuse the osdmap cache. that's a shame.. I think this is a bit of a red herring. We could confine each OSD to a single core, each with its own distinct messenger, with no cross-core context switches in the IO path, but still put all (or many) OSDs inside the same process with a shared OSDMap cache. The semantics of sharing the cache are comparatively simple, with a immutable maps that are only occasionally added. > - unable to share the connection to peer OSDs, mon and mgr. > probably it's not a big deal in comparison to existing non > co-located OSD, but if we compare it with the co-located OSD, well, > you'll see what we will be missing. The big question for me here is whether we would *want* to share connections (even if we could). My sense is that we'll get better performance if we don't, both due to the cross-core traffic inside the OSD(s), and also in the network itself. Is there a risk that having lots of TCP connections in the network is going to be more of a problem for the network itself? (Something in the switches?) I think the usual concerns with many TCP connections are around scalability of the in-kernel networks stack, but if each OSD has it's own local TCP+IP stack in userspace, then I suspect that goes away? sage