Hello Ceph users,
In our application, we found that we have a use case for appending to a
rados object in such a way that the client knows afterwards at what
offset the append happened, even while there may be other concurrent
clients doing the same thing.
At first I thought the client might use a write op for this purpose,
which allows multiple OSD operations to happen atomically. My
understanding is that successful write ops cannot return any data, so
one cannot stat the object, then append, then return the size obtained
from the stat (which is guaranteed to be the append offset). Instead,
the following algorithm can be used:
1. client stats the object to get its size
2. client issues a (atomic) write op which first verifies that the size
is still equal to what it was in step 1, and if yes then appends data.
If no, then the write op fails and the client returns to step 1.
But while there exists rados_write_op_cmpxattr() which offers a similar
validation feature for xattrs, there does not seem to be a way to
validate the size of an object in a write op.
To get around this, we wrote a Ceph class to implement step 2 above. It
takes an offset and some data as input, and appends the data to the
object only if the offset matches the object's size.
Did we miss another, simpler way of doing this? Is using a class a good
idea in this case?
By the way, I have a question about the class. Following the example in
cle_hello.cc method record_hello, our method calls cls_cxx_stat() and
yet is declared CLS_METHOD_WR, not CLS_METHOD_RD|CLS_METHOD_WR. Is
stating an object not considered reading it? How come the method does
not need the CLS_METHOD_RD flag? I tried including that flag to see what
would happen but then my method was unable to create new objects, which
we want to support with the same meaning as appending to a 0-size
object. It seems that in that case Ceph asserts that the objects exists
before calling the method.
We also briefly tried an alternative method using locking:
rados_lock_exclusive(), rados_stat(), rados_append(), rados_unlock() but
I felt that wasn't as good of a solution because locks don't block
waiting to be acquired, can remain stuck if a client terminates
abnormally, and that solution involves more round trips between the
client and server anyway.
Finally, is native support for this feature something that the Ceph team
would consider including?
-kv
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com