On Tue, Dec 10, 2019 at 11:02 PM Li Wang <laurence.liwang@xxxxxxxxx> wrote: > > Hi Jason, > If possible to do the following optimization, > (1) For write, update in memory map first, then write data and > asynchrously update map, > therefore will not have the first write performance problem > (2) For rbd open, after exclusive lock acquired, before loading the map, > write a flag MAP_IN_USE into the rbd header > (3) Before releasing exclusive lock, flush pending map writes, clean the flag > (4) For rbd open, if the flag exists before loading map, discard and > rebuild the map Changing the behaviour like this would break backwards compatibility with older clients. Therefore, it would really need a new feature bit to describe "object-map v2". Rebuilding the map on a large image is not a "free" operation since you might have to loop through tens of thousands of objects. That could be quite the unexpected surprise for a user attempting to restart a failed VM. > Cheers, > Li Wang > > Jason Dillaman <jdillama@xxxxxxxxxx> 于2019年12月9日周一 下午9:42写道: > > > > On Mon, Dec 9, 2019 at 8:19 AM Li Wang <laurence.liwang@xxxxxxxxx> wrote: > > > > > > Hi Jason, > > > If before the first write to object, the object map is updated first > > > to indicate > > > the object EXIST, what happen if crash occured before the data write, and after > > > the object map write, will the map wrongly indicate one object EXIST but in fact > > > NOTEXIST. In other words, the map subject to the following semantics, > > > if an object > > > > That's not an issue that would result in an object leak or data > > corruption. If the object-map flags the object as existing when it > > doesn't due to an untimely crash, it will either do an unnecessary > > read IO or delete request when removing the image. > > > > > NOTEXIST in map, it REALLY not exist. If an object EXIST in map, > > > it not necessarily exist. The read/write to such a object will return ENOENT, > > > and the client will read parent/copy up from parent then write, so > > > that it is not a problem. > > > If the above understanding is correct, how about diff computation, > > > will the wrong indication > > > > Yes, it will be wrong for the affected object so your diff will > > potentially include an extra object on the delta (but no data > > corruption). The object-map can be re-built using the CLI, but there > > really shouldn't be a need for such a corner case (that is just > > slightly sub-optimal). > > > > > in the map cause a problem. And, we are wondering what is the negative impacts > > > if disabling object map. > > > > > > Cheers, > > > Li Wang > > > > > > Jason Dillaman <jdillama@xxxxxxxxxx> 于2019年12月6日周五 下午9:56写道: > > > > > > > > On Thu, Dec 5, 2019 at 11:14 PM Li Wang <laurence.liwang@xxxxxxxxx> wrote: > > > > > > > > > > Hi Jason, > > > > > We found the synchronous process of object map, which, as a result, > > > > > write two objects > > > > > every write greatly slow down the first write performance of a newly > > > > > created rbd by up to 10x, > > > > > which is not acceptable in our scenario, so could we do some > > > > > optimizations on it, > > > > > for example, batch the map writes or lazy update the map, do we need > > > > > maintain accurate > > > > > synchronization between the map and the data objects? but after a > > > > > glimpse of the librbd codes, > > > > > it seems no transactional design for the two objects (map object and > > > > > data object) write? > > > > > > > > If you don't update the object-map before issuing the first write to > > > > the associated object, you could crash and therefore the object-map's > > > > state is worthless since you couldn't trust it to tell the truth. The > > > > cost of object-map is supposed to be amortized over time so the first > > > > writes on a new image will incur the performance hits, but future > > > > writes do not. > > > > > > > > The good news is that you are more than welcome to disable > > > > object-map/fast-diff if the performance penalty is too great for your > > > > application -- it's not a required feature of RBD. > > > > > > > > > > > > > > Cheers, > > > > > Li Wang > > > > > > > > > > > > > > > > > -- > > > > Jason > > > > > > > > > > > > > -- > > Jason > > > -- Jason _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx