RE: seastar crimson --- pglock solution discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Seems we can use seastar::with_lock(shared_mutex  sm)  to instead of pglock in crimson-osd. 



> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Liu, Chunmei
> Sent: Thursday, December 20, 2018 3:30 PM
> To: 'Gregory Farnum' <gfarnum@xxxxxxxxxx>
> Cc: The Esoteric Order of the Squid Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>;
> Kefu Chai <kchai@xxxxxxxxxx>; Cheng, Yingxin <yingxin.cheng@xxxxxxxxx>; Ma,
> Jianpeng <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx>
> Subject: RE: seastar crimson --- pglock solution discussion
> 
> 
> 
> > -----Original Message-----
> > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
> > Sent: Thursday, December 20, 2018 3:09 PM
> > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx>
> > Cc: The Esoteric Order of the Squid Cybernetic
> > <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; Cheng,
> > Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng
> > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx>
> > Subject: Re: seastar crimson --- pglock solution discussion
> >
> > On Thu, Dec 20, 2018 at 2:59 PM Liu, Chunmei <chunmei.liu@xxxxxxxxx>
> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
> > > > Sent: Wednesday, December 19, 2018 3:15 PM
> > > > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx>
> > > > Cc: The Esoteric Order of the Squid Cybernetic
> > > > <ceph-devel@xxxxxxxxxxxxxxx>; Kefu Chai <kchai@xxxxxxxxxx>; Cheng,
> > > > Yingxin <yingxin.cheng@xxxxxxxxx>; Ma, Jianpeng
> > > > <jianpeng.ma@xxxxxxxxx>; Radoslaw Zarzynski <rzarzyns@xxxxxxxxxx>
> > > > Subject: Re: seastar crimson --- pglock solution discussion
> > > >
> > > > On Mon, Dec 17, 2018 at 4:23 PM Liu, Chunmei
> > > > <chunmei.liu@xxxxxxxxx>
> > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > >     In order to keep IO request sequence in one pg, osd use
> > > > > pglock to guarantee
> > > > the sequence.  Here in Crimson, it is lockless, so we use
> > > > future/promise to do the same work.
> > > > >
> > > > >     We can design Each PG has its own IO request queue in
> > > > > seastar-crimson shard. And each PG has one member
> > > > > seastar::promise<> pg_ready;
> > > > >
> > > > >    When need pglock.lock(),  we use the following logic to instead:
> > > > >
> > > > >                                     return pg_ready.get_future()
> > > > //after satisfy the pg_ready promise later then the future will be
> > > > fulfilled
> > here.
> > > > >                                        .then([this] {
> > > > >                                             Pg_ready = seastar::promise<>{};
> //
> > set
> > > > promise pg_ready no future.
> > > > >                                                      Dequeue io
> > > > > from pg's request queue and do osd
> > > > following process.
> > > > >                                         });
> > > > >
> > > > >   When need pglock.unlock(), we use the following logic to instead:
> > > > >                                     then_wrapped([this] (auto fut) {
> > > > >                                        fut.forward_to(std::move(pg_ready));     // satisfy
> the
> > > > pg_ready promise
> > > > >                                     }); So the next IO request
> > > > > in the PG queue will not be dequeued until the pg_ready promise
> > > > > is satisfied after the
> > > > prior request has already been processed in OSD.
> > > > >
> > > > > Do you think it is workable?
> > > >
> > > > Have we considered *not* using a "global" pglock and instead
> > > > tracking dependencies more carefully?
> > > >
> > > > IIRC, in the current model we use the pg lock for two different
> > > > kinds of things
> > > > 1) to prevent mutating in-memory state incorrectly across racing
> > > > threads,
> > > > 2) to provide ordering of certain kinds of operations (eg, reads
> > > > of in-progress
> > > > writes)
> > >
> > > Another 3) pglog need to be sequenced, first IO request pglog should
> > > write first, for replicators consistency. (use pglog head/tail
> > > pointer to do recovery)
> >
> > Right.
> >
> > > > In Seastar, we shouldn't need to worry about (1) at all.
> > >
> > > Yes, that is correct. Since each pg only belong to one seastar thread.
> > >
> > > > (2) is of course more tricky, but it seems like we ought to be
> > > > able to do tracking more easily so as to condition dependencies
> > > > explicitly on the dependency. For instance, we can condition a
> > > > write operation being applied to the object store on its preceding
> > > > pg log operation being done; we can condition reads proceeding on
> > > > not having a
> > write to the same object in progress, etc.
> > > >
> > > How to do the condition in crimson?  Can you give an example here?
> > >
> > > (3) Since Crimson code run in async mode, how to grantee pglog write
> > > in
> > sequence?
> >
> > I haven't worked with the Crimson code directly, but I assume we'd
> > have some kind of sequencer, and that there are pre-existing futures
> > around the operations being completed or stored on disk.
> >
> > So couldn't we get those futures back when getting a pglog, and
> > condition our own steps on those being done at the right points? Or
> > would that be too expensive to track?
> 
> What I suggested on above is use future/promise to guarantee the sequence
> which is not blocked but get the same result as pglock.  After the first IO request
> send to ObjectStore, satisfy the promise, then the code get the future back and
> dequeuer the next IO request from the same PG queue.
> But for write/read one Object, we need consider more, since write is handled by
> ObjectStore and read is done by PG layer. Before ceph osd use
> ondisk_write_lock and ondisk_read_lock to guarantee the write/read one object
> sequence, but current code no those locks, I am not sure what mechanism used
> now. Will Check the code.
> 
> 
> > -Greg




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux