Re: libcephfs create file with layout and replication

Sage Weil <sage@xxxxxxxxxxx> · Sat, 17 Nov 2012 15:23:32 -0800 (PST)

On Sat, 17 Nov 2012, Noah Watkins wrote:
> The Hadoop VFS layer assumes that block size and replication can be
> set on a per-file basis, which is important to users for file
> layout/workload optimizations.
> 
> The libcephfs interface doesn't make this entirely easy. Here is one
> approach, but it isn't thread safe as the default values are global
> variables in the client.
> 
>   orig_obj_size = ceph_get_default_object_size() //save
>   set_default_object_size(new size)
>   open(path, O_CREAT)
>   set_default_object_size(new size) //reset
> 
> Something more convenient might be:
> 
>   ceph_open_layout(path, flags, mode, layout, replication)
> 
> where layout and replication are used with O_CREAT | O_EXCL, or and
> interface for setting these values explicitly on newly created files:
> 
>   ceph_open(path, O_CREAT|O_EXCL)
>   ceph_set_layout(path, layout, replication)

This is basically what we have now... at least that's how things work for 
the kernel client.  We should make sure there is a clean way via libcephfs 
to do that.

The client/mds protocol also allows you to specify the layout on file 
creation.  This is better since it has one less round trip to the MDS.  
Let's just create a new open call with those additional arguments.

FWIW, the striping parameters are object size, stripe unit, stripe count, 
and data pool.

sage

> 
> where ceph_set_layout would succeed ostensibly on zero-length files.
> 
> Any thoughts on how to handle this?
> 
> Thanks,
> Noah
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html