On 11/17/2012 12:13 PM, Noah Watkins wrote:
The Hadoop VFS layer assumes that block size and replication can be set on a per-file basis, which is important to users for file layout/workload optimizations. The libcephfs interface doesn't make this entirely easy. Here is one approach, but it isn't thread safe as the default values are global variables in the client. orig_obj_size = ceph_get_default_object_size() //save set_default_object_size(new size) open(path, O_CREAT) set_default_object_size(new size) //reset Something more convenient might be: ceph_open_layout(path, flags, mode, layout, replication)
I think this makes the most sense, since changing the layout of a file after it's been created can't happen, and this interface makes that the most clear. It also avoids maintaining extra state in libcephfs between calls. Since replication count is a per-pool setting, I think the hadoop bindings would have to translate from a vfs request to a pool with the requested replication level. So something like this, where layout is a struct containing stripe unit, stripe count, and object size (the subset of struct ceph_file_layout related to objects that's useful currently): ceph_open_layout(path, flags, mode, layout, pool_name) BTW, for anyone interested, there's a nice description of the layout parameters here: http://ceph.com/docs/master/dev/file-striping/
where layout and replication are used with O_CREAT | O_EXCL, or and interface for setting these values explicitly on newly created files: ceph_open(path, O_CREAT|O_EXCL) ceph_set_layout(path, layout, replication) where ceph_set_layout would succeed ostensibly on zero-length files. Any thoughts on how to handle this? Thanks, Noah
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html