> -----Original Message----- > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] > Sent: Thursday, December 20, 2018 3:09 PM > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > Cc: The Esoteric Order of the Squid Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>; > Kefu Chai <kchai@xxxxxxxxxx>; Cheng, Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, > Jianpeng <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> > Subject: Re: seastar crimson --- pglock solution discussion > > On Thu, Dec 20, 2018 at 2:59 PM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > > > > > > > > > -----Original Message----- > > > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] > > > Sent: Wednesday, December 19, 2018 3:15 PM > > > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > > > Cc: The Esoteric Order of the Squid Cybernetic > > > <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; Cheng, > > > Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng > > > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> > > > Subject: Re: seastar crimson --- pglock solution discussion > > > > > > On Mon, Dec 17, 2018 at 4:23 PM Liu, Chunmei <chunmei.liu@xxxxxxxxx> > wrote: > > > > > > > > Hi all, > > > > > > > > In order to keep IO request sequence in one pg, osd use pglock > > > > to guarantee > > > the sequence. Here in Crimson, it is lockless, so we use > > > future/promise to do the same work. > > > > > > > > We can design Each PG has its own IO request queue in > > > > seastar-crimson shard. And each PG has one member > > > > seastar::promise<> pg_ready; > > > > > > > > When need pglock.lock(), we use the following logic to instead: > > > > > > > > return pg_ready.get_future() > > > //after satisfy the pg_ready promise later then the future will be fulfilled > here. > > > > .then([this] { > > > > Pg_ready = seastar::promise<>{}; // > set > > > promise pg_ready no future. > > > > Dequeue io > > > > from pg's request queue and do osd > > > following process. > > > > }); > > > > > > > > When need pglock.unlock(), we use the following logic to instead: > > > > then_wrapped([this] (auto fut) { > > > > fut.forward_to(std::move(pg_ready)); // satisfy the > > > pg_ready promise > > > > }); So the next IO request in > > > > the PG queue will not be dequeued until the pg_ready promise is > > > > satisfied after the > > > prior request has already been processed in OSD. > > > > > > > > Do you think it is workable? > > > > > > Have we considered *not* using a "global" pglock and instead > > > tracking dependencies more carefully? > > > > > > IIRC, in the current model we use the pg lock for two different > > > kinds of things > > > 1) to prevent mutating in-memory state incorrectly across racing > > > threads, > > > 2) to provide ordering of certain kinds of operations (eg, reads of > > > in-progress > > > writes) > > > > Another 3) pglog need to be sequenced, first IO request pglog should > > write first, for replicators consistency. (use pglog head/tail pointer > > to do recovery) > > Right. > > > > In Seastar, we shouldn't need to worry about (1) at all. > > > > Yes, that is correct. Since each pg only belong to one seastar thread. > > > > > (2) is of course more tricky, but it seems like we ought to be able > > > to do tracking more easily so as to condition dependencies > > > explicitly on the dependency. For instance, we can condition a write > > > operation being applied to the object store on its preceding pg log > > > operation being done; we can condition reads proceeding on not having a > write to the same object in progress, etc. > > > > > How to do the condition in crimson? Can you give an example here? > > > > (3) Since Crimson code run in async mode, how to grantee pglog write in > sequence? > > I haven't worked with the Crimson code directly, but I assume we'd have some > kind of sequencer, and that there are pre-existing futures around the operations > being completed or stored on disk. > > So couldn't we get those futures back when getting a pglog, and condition our > own steps on those being done at the right points? Or would that be too > expensive to track? What I suggested on above is use future/promise to guarantee the sequence which is not blocked but get the same result as pglock. After the first IO request send to ObjectStore, satisfy the promise, then the code get the future back and dequeuer the next IO request from the same PG queue. But for write/read one Object, we need consider more, since write is handled by ObjectStore and read is done by PG layer. Before ceph osd use ondisk_write_lock and ondisk_read_lock to guarantee the write/read one object sequence, but current code no those locks, I am not sure what mechanism used now. Will Check the code. > -Greg