Re: About the optimization of rbd object map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2019 at 11:02 PM Li Wang <laurence.liwang@xxxxxxxxx> wrote:
>
> Hi Jason,
>   If possible to do the following optimization,
>   (1) For write, update in memory map first, then write data and
> asynchrously update map,
> therefore will not have the first write performance problem
>   (2) For rbd open, after exclusive lock acquired, before loading the map,
> write a flag MAP_IN_USE into the rbd header
>   (3) Before releasing exclusive lock, flush pending map writes, clean the flag
>   (4) For rbd open, if the flag exists before loading map, discard and
> rebuild the map

Changing the behaviour like this would break backwards compatibility
with older clients. Therefore, it would really need a new feature bit
to describe "object-map v2". Rebuilding the map on a large image is
not a "free" operation since you might have to loop through tens of
thousands of objects. That could be quite the unexpected surprise for
a user attempting to restart a failed VM.

> Cheers,
> Li Wang
>
> Jason Dillaman <jdillama@xxxxxxxxxx> 于2019年12月9日周一 下午9:42写道:
> >
> > On Mon, Dec 9, 2019 at 8:19 AM Li Wang <laurence.liwang@xxxxxxxxx> wrote:
> > >
> > > Hi Jason,
> > >   If before the first write to object, the object map is updated first
> > > to indicate
> > > the object EXIST, what happen if crash occured before the data write, and after
> > > the object map write, will the map wrongly indicate one object EXIST but in fact
> > > NOTEXIST. In other words, the map subject to the following semantics,
> > > if an object
> >
> > That's not an issue that would result in an object leak or data
> > corruption. If the object-map flags the object as existing when it
> > doesn't due to an untimely crash, it will either do an unnecessary
> > read IO or delete request when removing the image.
> >
> > > NOTEXIST in map, it REALLY not exist. If an object EXIST in map,
> > > it not necessarily exist. The read/write to such a object will return ENOENT,
> > > and the client will read parent/copy up from parent then write, so
> > > that it is not a problem.
> > > If the above understanding is correct, how about diff computation,
> > > will the wrong indication
> >
> > Yes, it will be wrong for the affected object so your diff will
> > potentially include an extra object on the delta (but no data
> > corruption). The object-map can be re-built using the CLI, but there
> > really shouldn't be a need for such a corner case (that is just
> > slightly sub-optimal).
> >
> > > in the map cause a problem.  And, we are wondering what is the negative impacts
> > > if disabling object map.
> > >
> > > Cheers,
> > > Li Wang
> > >
> > > Jason Dillaman <jdillama@xxxxxxxxxx> 于2019年12月6日周五 下午9:56写道:
> > > >
> > > > On Thu, Dec 5, 2019 at 11:14 PM Li Wang <laurence.liwang@xxxxxxxxx> wrote:
> > > > >
> > > > > Hi Jason,
> > > > >   We found the synchronous process of object map, which, as a result,
> > > > > write two objects
> > > > > every write greatly slow down the first write performance of a newly
> > > > > created rbd by up to 10x,
> > > > > which is not acceptable in our scenario, so could we do some
> > > > > optimizations on it,
> > > > > for example, batch the map writes or lazy update the map, do we need
> > > > > maintain accurate
> > > > > synchronization between the map and the data objects? but after a
> > > > > glimpse of the librbd codes,
> > > > > it seems no transactional design for the two objects (map object and
> > > > > data object) write?
> > > >
> > > > If you don't update the object-map before issuing the first write to
> > > > the associated object, you could crash and therefore the object-map's
> > > > state is worthless since you couldn't trust it to tell the truth. The
> > > > cost of object-map is supposed to be amortized over time so the first
> > > > writes on a new image will incur the performance hits, but future
> > > > writes do not.
> > > >
> > > > The good news is that you are more than welcome to disable
> > > > object-map/fast-diff if the performance penalty is too great for your
> > > > application -- it's not a required feature of RBD.
> > > >
> > > > >
> > > > > Cheers,
> > > > > Li Wang
> > > > >
> > > >
> > > >
> > > > --
> > > > Jason
> > > >
> > >
> >
> >
> > --
> > Jason
> >
>


-- 
Jason
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux