Appending to a rados object with feedback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ceph users,

In our application, we found that we have a use case for appending to a rados object in such a way that the client knows afterwards at what offset the append happened, even while there may be other concurrent clients doing the same thing.

At first I thought the client might use a write op for this purpose, which allows multiple OSD operations to happen atomically. My understanding is that successful write ops cannot return any data, so one cannot stat the object, then append, then return the size obtained from the stat (which is guaranteed to be the append offset). Instead, the following algorithm can be used:

1. client stats the object to get its size
2. client issues a (atomic) write op which first verifies that the size is still equal to what it was in step 1, and if yes then appends data. If no, then the write op fails and the client returns to step 1.

But while there exists rados_write_op_cmpxattr() which offers a similar validation feature for xattrs, there does not seem to be a way to validate the size of an object in a write op.

To get around this, we wrote a Ceph class to implement step 2 above. It takes an offset and some data as input, and appends the data to the object only if the offset matches the object's size.

Did we miss another, simpler way of doing this? Is using a class a good idea in this case?

By the way, I have a question about the class. Following the example in cle_hello.cc method record_hello, our method calls cls_cxx_stat() and yet is declared CLS_METHOD_WR, not CLS_METHOD_RD|CLS_METHOD_WR. Is stating an object not considered reading it? How come the method does not need the CLS_METHOD_RD flag? I tried including that flag to see what would happen but then my method was unable to create new objects, which we want to support with the same meaning as appending to a 0-size object. It seems that in that case Ceph asserts that the objects exists before calling the method.

We also briefly tried an alternative method using locking: rados_lock_exclusive(), rados_stat(), rados_append(), rados_unlock() but I felt that wasn't as good of a solution because locks don't block waiting to be acquired, can remain stuck if a client terminates abnormally, and that solution involves more round trips between the client and server anyway.

Finally, is native support for this feature something that the Ceph team would consider including?

-kv
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux