On Thu, Dec 20, 2018 at 2:59 PM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > > > > > -----Original Message----- > > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] > > Sent: Wednesday, December 19, 2018 3:15 PM > > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > > Cc: The Esoteric Order of the Squid Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>; > > Kefu Chai <kchai@xxxxxxxxxx>; Cheng, Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, > > Jianpeng <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> > > Subject: Re: seastar crimson --- pglock solution discussion > > > > On Mon, Dec 17, 2018 at 4:23 PM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > > > > > > Hi all, > > > > > > In order to keep IO request sequence in one pg, osd use pglock to guarantee > > the sequence. Here in Crimson, it is lockless, so we use future/promise to do the > > same work. > > > > > > We can design Each PG has its own IO request queue in > > > seastar-crimson shard. And each PG has one member seastar::promise<> > > > pg_ready; > > > > > > When need pglock.lock(), we use the following logic to instead: > > > > > > return pg_ready.get_future() > > //after satisfy the pg_ready promise later then the future will be fulfilled here. > > > .then([this] { > > > Pg_ready = seastar::promise<>{}; // set > > promise pg_ready no future. > > > Dequeue io from pg's request queue and do osd > > following process. > > > }); > > > > > > When need pglock.unlock(), we use the following logic to instead: > > > then_wrapped([this] (auto fut) { > > > fut.forward_to(std::move(pg_ready)); // satisfy the > > pg_ready promise > > > }); So the next IO request in the > > > PG queue will not be dequeued until the pg_ready promise is satisfied after the > > prior request has already been processed in OSD. > > > > > > Do you think it is workable? > > > > Have we considered *not* using a "global" pglock and instead tracking > > dependencies more carefully? > > > > IIRC, in the current model we use the pg lock for two different kinds of things > > 1) to prevent mutating in-memory state incorrectly across racing threads, > > 2) to provide ordering of certain kinds of operations (eg, reads of in-progress > > writes) > > Another 3) pglog need to be sequenced, first IO request pglog should write first, for replicators consistency. (use pglog head/tail pointer to do recovery) Right. > > In Seastar, we shouldn't need to worry about (1) at all. > > Yes, that is correct. Since each pg only belong to one seastar thread. > > > (2) is of course more tricky, but it seems like we ought to be able to do tracking > > more easily so as to condition dependencies explicitly on the dependency. For > > instance, we can condition a write operation being applied to the object store > > on its preceding pg log operation being done; we can condition reads > > proceeding on not having a write to the same object in progress, etc. > > > How to do the condition in crimson? Can you give an example here? > > (3) Since Crimson code run in async mode, how to grantee pglog write in sequence? I haven't worked with the Crimson code directly, but I assume we'd have some kind of sequencer, and that there are pre-existing futures around the operations being completed or stored on disk. So couldn't we get those futures back when getting a pglog, and condition our own steps on those being done at the right points? Or would that be too expensive to track? -Greg