Re: cross-core communications in seastar-osd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have another idea which could avoid must-cross-core way. When osd
booting or startup, osd has three shards to map PGs, like
shard(x)=hash(PG) % shards. Different shard bind different port, when
client try to connect osd known from osdmap, we could add extra logic
in messenger handshake to allow redirect to expected port. It means we
allow message redispatch to another core, but it client can aware
this, it can learn from osd and send message to expected port in the
future.

I think 2 or 3 shards per osd is enough, so we may burst 2/3 times
connection than before. even more, we can use nic rx rss feature to
let kernel avoid cross core switch.

kefu chai <tchaikov@xxxxxxxxx> 于2018年8月24日周五 下午6:07写道:
>
> this is a summary of discussion on osd-seastar we had in a meeting in this week.
>
> seastar use share-nothing design to take the advantage of multi-core
> hardware. but there are some inherent problems in OSD. in seastar-osd,
> we will have a sharded osd service listening on given port on all
> configured cores in parallel using SO_REUSEPORT, so the connections
> are evenly distributed [0] across all seastar reactors.
>
> also, in seastar-osd, to shard PGs on different cores looks like an
> intuitive design. for instance, we can
> - ensure the order of osd op to maintain a pglog
> - have better control of the io queue-depth of the storage device
> - maintain a consistent state without extra "locking" of the
> underlying ObjectStore and PG instances.
>
> but we cannot enforce a client to send requests to a single PG, or the
> PGs which happen to be hosted by the core which accepts the connection
> from this client. so i think we can only have a run-to-completion
> session for a request chain which is targeting a certain PG, and
> forward the client to whichever the PG it wants to talk to. this
> cross-core communication is inevitable, i think.
>
> to avoid starving low traffic connection by high traffic client on a
> certain core, we use the `Throttle` attached to each connection. see
> SocketConnection::maybe_throttle().
>
> ---
>
> [0] https://lwn.net/Articles/542629/
> --
> Regards
> Kefu Chai




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux