Re: single-threaded seastar-osd

王豪迈 <haomai@xxxxxxxx> · Wed, 9 Jan 2019 07:22:56 +0000

foundationdb is a great example for single thread design. it uses
actor programming model to make all things nonblock and only one
thread is used in one fdb process. With our observe, it showed
reducing great cpu usage with high ops compared to multi-thread lock
design.

foundationdb natively supports one physical device multi fdb process,
uses `id` to identity different fdb process. and disk failure domain
also supported.

the interesting is it showed very good linear scality while increasing
fdb process.

Gregory Farnum <gfarnum@xxxxxxxxxx> 于2019年1月9日周三 下午1:33写道：
>
> On Sun, Jan 6, 2019 at 6:30 PM Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> wrote:
> >
> > On Sat, Jan 5, 2019 at 3:42 PM kefu chai <tchaikov@xxxxxxxxx> wrote:
> >
> > > recently, when we are working on cross-core messenger[0], we found
> > > that, in order to share the connection between cores we need to have
> > > types like "seastar::lw_shared_ptr<seastar::foreign_ptr<ConnectionRef>>",
> > > because
> > > - the connections to peer OSDs are shared across cores,
> > > - the connections are shared by multiple continuations on the local
> > > core -- either locally or remotely.
> >
> > Well, I perceive this type as a clever solution to get back some of
> > the properties seastar::foreign_ptr had taken from us. What worries
> > me is that we need such heavy artillery at the very early stage of
> > project life.
>
> I think this is actually the key point to address. For all the
> potential futures we have, there are two inescapable facts:
> 1) a single-core OSD will be simpler to program
> 2) a single-core OSD will be dramatically less flexible in the face of
> change than anything else.
>
> So the real question boils down to: is the increased development speed
> worth the HUGE lock-in we get?
>
> I'm not doing much work on it yet, but I tend to think it's not worth it:
> 1) It's not surprising we are setting up "heavy artillery" now — this
> is exactly when we should be setting it up, as we establish the
> infrastructure needs we have which differ from what ScyllaDB needs.
> The messenger-PG crossbar has been much-discussed and was anticipated.
> 2) As I understand it, the complexity we're setting up here is really
> one-time programmer costs which are well-abstracted. So yes, the
> messenger is a little more complicated and the real type we alias to
> ceph:message (or whatever) is larger, but it doesn't impose load on
> everyday developers adding features to RADOS.
> 3) If we restrict ourselves to a single-core OSD, porting the code
> away from that will be nearly impossible. If we establish practices
> from the start which allow us to go multi-core, it's easy to switch to
> single-core with a few type aliases and get whatever speed benefits it
> promises us.
> -Greg