cross-core communications in seastar-osd

kefu chai <tchaikov@xxxxxxxxx> · Fri, 24 Aug 2018 18:04:15 +0800

this is a summary of discussion on osd-seastar we had in a meeting in this week.

seastar use share-nothing design to take the advantage of multi-core
hardware. but there are some inherent problems in OSD. in seastar-osd,
we will have a sharded osd service listening on given port on all
configured cores in parallel using SO_REUSEPORT, so the connections
are evenly distributed [0] across all seastar reactors.

also, in seastar-osd, to shard PGs on different cores looks like an
intuitive design. for instance, we can
- ensure the order of osd op to maintain a pglog
- have better control of the io queue-depth of the storage device
- maintain a consistent state without extra "locking" of the
underlying ObjectStore and PG instances.

but we cannot enforce a client to send requests to a single PG, or the
PGs which happen to be hosted by the core which accepts the connection
from this client. so i think we can only have a run-to-completion
session for a request chain which is targeting a certain PG, and
forward the client to whichever the PG it wants to talk to. this
cross-core communication is inevitable, i think.

to avoid starving low traffic connection by high traffic client on a
certain core, we use the `Throttle` attached to each connection. see
SocketConnection::maybe_throttle().

---

[0] https://lwn.net/Articles/542629/
-- 
Regards
Kefu Chai