> -----Original Message----- > From: kefu chai [mailto:tchaikov@xxxxxxxxx] > Sent: Thursday, February 21, 2019 3:51 AM > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > Cc: Gregory Farnum <gfarnum@xxxxxxxxxx>; The Esoteric Order of the Squid > Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; > Cheng, Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> > Subject: Re: seastar crimson --- pglock solution discussion > > On Thu, Feb 21, 2019 at 9:32 AM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > > > > Seems we can use seastar::with_lock(shared_mutex sm) to instead of pglock > in crimson-osd. > > i don't think makes much difference in the sense of the granularity of the lock if > we use POSIX lock or its seastar counterpart. the point is that, shall we have a > PG lock. one way to sequentialize the ops is to chain the write ops to a > shared_future, which could be a member variable of pg log. see also > SocketConnection::close_ready. Kefu, I don't understand why we can use shared_future to guarantee request sequence in same PG? Can you give an example? > > > > > > > > > > -----Original Message----- > > > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- > > > owner@xxxxxxxxxxxxxxx] On Behalf Of Liu, Chunmei > > > Sent: Thursday, December 20, 2018 3:30 PM > > > To: 'Gregory Farnum' <gfarnum@xxxxxxxxxx> > > > Cc: The Esoteric Order of the Squid Cybernetic > > > <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; Cheng, > > > Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng > > > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> > > > Subject: RE: seastar crimson --- pglock solution discussion > > > > > > > > > > > > > -----Original Message----- > > > > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] > > > > Sent: Thursday, December 20, 2018 3:09 PM > > > > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > > > > Cc: The Esoteric Order of the Squid Cybernetic > > > > <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; Cheng, > > > > Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng > > > > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx> > > > > Subject: Re: seastar crimson --- pglock solution discussion > > > > > > > > On Thu, Dec 20, 2018 at 2:59 PM Liu, Chunmei > > > > <chunmei.liu@xxxxxxxxx> > > > wrote: > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] > > > > > > Sent: Wednesday, December 19, 2018 3:15 PM > > > > > > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > > > > > > Cc: The Esoteric Order of the Squid Cybernetic > > > > > > <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; > > > > > > Cheng, Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng > > > > > > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski > > > > > > <rzarzyns@xxxxxxxxxx> > > > > > > Subject: Re: seastar crimson --- pglock solution discussion > > > > > > > > > > > > On Mon, Dec 17, 2018 at 4:23 PM Liu, Chunmei > > > > > > <chunmei.liu@xxxxxxxxx> > > > > wrote: > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > In order to keep IO request sequence in one pg, osd use > > > > > > > pglock to guarantee > > > > > > the sequence. Here in Crimson, it is lockless, so we use > > > > > > future/promise to do the same work. > > > > > > > > > > > > > > We can design Each PG has its own IO request queue in > > > > > > > seastar-crimson shard. And each PG has one member > > > > > > > seastar::promise<> pg_ready; > > > > > > > > > > > > > > When need pglock.lock(), we use the following logic to instead: > > > > > > > > > > > > > > return > > > > > > > pg_ready.get_future() > > > > > > //after satisfy the pg_ready promise later then the future > > > > > > will be fulfilled > > > > here. > > > > > > > .then([this] { > > > > > > > Pg_ready = > > > > > > > seastar::promise<>{}; > > > // > > > > set > > > > > > promise pg_ready no future. > > > > > > > Dequeue > > > > > > > io from pg's request queue and do osd > > > > > > following process. > > > > > > > }); > > > > > > > > > > > > > > When need pglock.unlock(), we use the following logic to instead: > > > > > > > then_wrapped([this] (auto fut) { > > > > > > > fut.forward_to(std::move(pg_ready)); // > satisfy > > > the > > > > > > pg_ready promise > > > > > > > }); So the next IO > > > > > > > request in the PG queue will not be dequeued until the > > > > > > > pg_ready promise is satisfied after the > > > > > > prior request has already been processed in OSD. > > > > > > > > > > > > > > Do you think it is workable? > > > > > > > > > > > > Have we considered *not* using a "global" pglock and instead > > > > > > tracking dependencies more carefully? > > > > > > > > > > > > IIRC, in the current model we use the pg lock for two > > > > > > different kinds of things > > > > > > 1) to prevent mutating in-memory state incorrectly across > > > > > > racing threads, > > > > > > 2) to provide ordering of certain kinds of operations (eg, > > > > > > reads of in-progress > > > > > > writes) > > > > > > > > > > Another 3) pglog need to be sequenced, first IO request pglog > > > > > should write first, for replicators consistency. (use pglog > > > > > head/tail pointer to do recovery) > > > > > > > > Right. > > > > > > > > > > In Seastar, we shouldn't need to worry about (1) at all. > > > > > > > > > > Yes, that is correct. Since each pg only belong to one seastar thread. > > > > > > > > > > > (2) is of course more tricky, but it seems like we ought to be > > > > > > able to do tracking more easily so as to condition > > > > > > dependencies explicitly on the dependency. For instance, we > > > > > > can condition a write operation being applied to the object > > > > > > store on its preceding pg log operation being done; we can > > > > > > condition reads proceeding on not having a > > > > write to the same object in progress, etc. > > > > > > > > > > > How to do the condition in crimson? Can you give an example here? > > > > > > > > > > (3) Since Crimson code run in async mode, how to grantee pglog > > > > > write in > > > > sequence? > > > > > > > > I haven't worked with the Crimson code directly, but I assume we'd > > > > have some kind of sequencer, and that there are pre-existing > > > > futures around the operations being completed or stored on disk. > > > > > > > > So couldn't we get those futures back when getting a pglog, and > > > > condition our own steps on those being done at the right points? > > > > Or would that be too expensive to track? > > > > > > What I suggested on above is use future/promise to guarantee the > > > sequence which is not blocked but get the same result as pglock. > > > After the first IO request send to ObjectStore, satisfy the promise, > > > then the code get the future back and dequeuer the next IO request from the > same PG queue. > > > But for write/read one Object, we need consider more, since write is > > > handled by ObjectStore and read is done by PG layer. Before ceph osd > > > use ondisk_write_lock and ondisk_read_lock to guarantee the > > > write/read one object sequence, but current code no those locks, I > > > am not sure what mechanism used now. Will Check the code. > > > > > > > > > > -Greg > > > > -- > Regards > Kefu Chai